Wednesday, March 13, 2024

Why To Avoid Hash Algorithms If What You Really Need Is A Checksum

Sometimes we hear that someone plans to use a hash function for fault detection instead of a checksum. This is probably not the best idea, because hashes provide weaker fault detection for random independent bit faults than a decent checksum.

A hash function is typically optimized for really good bit mixing, but not for ensuring that small numbers of bit faults are always detected. For a Pud effectiveness metric (random independent bit faults at a particular BER), this means that a hash function is likely to do worse than a good checksum.

To explore this, we evaluated the hash function Murmur3 (32 bits) on a Monte Carlo simulation of random independent bit faults and compared it with a variety of checksums:

Monte Carlo simulations of 32-bit checksum and hash functions, BER=1e-12

The results show that Murmur3 is an excellent approximation of a Simple HD=2 curve, which detects all 1-bit faults, and has a 1/(2**32) probability of undetected fault for all other numbers of bit faults. That is expected of a good hash function. Murmur3 turns out to have Hamming Distance=2, at least for the lengths we investigated.

In comparison, a 32-bit one's complement checksum does worse, but that is not a huge surprise.

All the other checksums did better.  Dual-sum and Koopman checksums provided HD=3, although with less mixing. Their curves are below Murmur3 but not all the way down at the Simple HD=3 curve.

DualX-32P and Koopman-32P both have curves above the Simple HD=4 curve. CRC curves were not plotted, but would be at least as good as the Koopman-32P curve, and potentially much better for short data word lengths where a higher HD can be provided.

Summarizing, Murmur3 does a better job of bit mixing than the checksum functions, but suffers in ability to detect random independent bit flips due to being limited to HD=2.

We expect that other good hash functions will trace out either an approximate Simple HD=2 curve, or possibly a Simple HD=1 curve depending on their specifics. A dual-sum checksum, a Koopman checksum, or a CRC will all provide significantly better fault detection.

These curves are for random independent bit faults. For memory arrays sometimes people are concerned with multi-bit single event upsets. The answer to how these will perform will depend on the specifics of the geometry of the memory array (for example, row size and whether columns are interleaved). Checksums and CRCs will generally be good at multi-bit faults in bits that are adjacent in the data word. And the 32-P checksums will detect all 1-, 2-, and 3-bit faults regardless of the bit position. Beyond that is difficult to know without analysis tailored to the specific situation of the memory array you care about as well as the comparative frequency and patterns of different numbers of bit faults.

Wednesday, February 28, 2024

Comparative speeds for different Checksum & CRC implementations

Comparative speeds for different Checksum & CRC implementations.

For more information see my new book:  Understanding Checksums and Cyclic Redundancy Checks -- which includes downloadable open source example implementations for the checksums mentioned below.  This is a preview of section 12.1 from that book.

Monday, February 19, 2024

Book: Understanding Checksums and Cyclic Redundancy Checks

Amazon e-book:   (Also available from country-specific Amazon web sites for:  UK, DE, FR, ES, IT, NL, JP, BR, CA, MX, AU, IN).  

Updated, permanent post with more info here:

Monday, January 1, 2024

Explanation of CRC Operation: Polynomial Division, Shift Register, and Comparison (video)

 Here is a video explaining the mechanics of Cyclic Redundancy Check computations, including:

  • An example of the polynomial division approach
  • An example of the XOR-feedback shift register approach
  • A step-by-step comparison of how the polynomial division and shift register approaches are actually doing exactly the same computation.
If you've ever wondered why it is that a shift register and polynomial division end up with the same result, this video explains it.

FULL VIDEO:   YouTube |

The full video consists of these four parts: (YouTube Play List):
  • Part 1: Example CRC Division to compute check value (YouTube 4:25)
  • Part 2: Example CRC Validation via Polynomial Division (YouTube 3:30)
  • Part 3: Example CRC Computation via Hardware Shift Register (YouTube 8:10)
  • Part 4: Comparison of CRC Polynomial Division and Hardware Shift Register (YouTube 6:21)

Thursday, December 28, 2023

New, faster version of HDLEN

 A new, faster version of hdlen is now available (version 2.0.0).

This uses a hash table to dramatically speed up determining the maximum data word length for each Hamming Distance for any particular CRC polynomial.

More info here:

Wednesday, May 10, 2023

Koopman Checksum

An Improved Modular Addition Checksum Algorithm

Philip Koopman

This paper introduces a checksum algorithm that provides a new point in the performance/complexity/effectiveness checksum tradeoff space. It has better fault detection properties than single-sum and dual-sum modular addition checksums. It is also simpler to compute efficiently than a cyclic redundancy check (CRC) due to exploiting commonly available hardware and programming language support for unsigned integer division. The key idea is to compute a single running sum, but introduce a left shift by the size (in bits) of the modulus before performing the modular reduction after each addition step. This approach provides a Hamming Distance of 3 for longer data word lengths than dual-sum approaches such as the Fletcher checksum. Moreover, it provides this capability using a single running sum that is only twice the size of the final computed check value, while providing fault detection capabilities even better than large-block variants of dual-sum approaches that require larger division operations. A variant that includes a parity bit achieves Hamming Distance 4 for about half the data word length as the baseline version for the same size check value.

Full paper here:

Below is a simplistic comparison of execution time for 32-bit checksums. Every system will have a different outcome, but this gives a general idea of what to expect. Koopman checksums have about the same run time as a Fletcher or Adler checksum, and likely have slightly faster run time for a machine such as an Intel server that has strong hardware support for division. (The DualSum32 implementation includes an optimization of delaying the SumB modulo operation until the end of the computation.) Nonetheless, Koopman checksums provide better Hamming Distance performance.  CRC_R32T is a CRC implementation that uses a 256-element lookup table to speed up the computation.

Saturday, May 6, 2023

Large-Block Checksums

Large-Block Modular Addition Checksum Algorithms

Philip Koopman

Checksum algorithms are widely employed due to their use of a simple algorithm with fast computational speed to provide a basic detection capability for corrupted data. This paper describes the benefits of adding the design parameter of increased data block size for modular addition checksums, combined with an empirical approach to modulus selection. A longer processing block size with the right modulus can provide significantly better fault detection performance with no change in the number of bytes used to store the check value. In particular, a large-block dual-sum approach provides Hamming Distance 3-class fault detection performance for many times the data word length capability of previously studied Fletcher and Adler checksums. Moduli of 253 and 65525 are identified as being particularly effective for general-purpose checksum use.

Download the full paper here:

Why To Avoid Hash Algorithms If What You Really Need Is A Checksum

Sometimes we hear that someone plans to use a hash function for fault detection instead of a checksum. This is probably not the best idea, b...