Wednesday, March 13, 2024

Why To Avoid Hash Algorithms If What You Really Need Is A Checksum

Sometimes we hear that someone plans to use a hash function for fault detection instead of a checksum. This is probably not the best idea, because hashes provide weaker fault detection for random independent bit faults than a decent checksum.

A hash function is typically optimized for really good bit mixing, but not for ensuring that small numbers of bit faults are always detected. For a Pud effectiveness metric (random independent bit faults at a particular BER), this means that a hash function is likely to do worse than a good checksum.

To explore this, we evaluated the hash function Murmur3 (32 bits) on a Monte Carlo simulation of random independent bit faults and compared it with a variety of checksums:

Monte Carlo simulations of 32-bit checksum and hash functions, BER=1e-12

The results show that Murmur3 is an excellent approximation of a Simple HD=2 curve, which detects all 1-bit faults, and has a 1/(2**32) probability of undetected fault for all other numbers of bit faults. That is expected of a good hash function. Murmur3 turns out to have Hamming Distance=2, at least for the lengths we investigated.

In comparison, a 32-bit one's complement checksum does worse, but that is not a huge surprise.

All the other checksums did better.  Dual-sum and Koopman checksums provided HD=3, although with less mixing. Their curves are below Murmur3 but not all the way down at the Simple HD=3 curve.

DualX-32P and Koopman-32P both have curves above the Simple HD=4 curve. CRC curves were not plotted, but would be at least as good as the Koopman-32P curve, and potentially much better for short data word lengths where a higher HD can be provided.

Summarizing, Murmur3 does a better job of bit mixing than the checksum functions, but suffers in ability to detect random independent bit flips due to being limited to HD=2.

We expect that other good hash functions will trace out either an approximate Simple HD=2 curve, or possibly a Simple HD=1 curve depending on their specifics. A dual-sum checksum, a Koopman checksum, or a CRC will all provide significantly better fault detection.

These curves are for random independent bit faults. For memory arrays sometimes people are concerned with multi-bit single event upsets. The answer to how these will perform will depend on the specifics of the geometry of the memory array (for example, row size and whether columns are interleaved). Checksums and CRCs will generally be good at multi-bit faults in bits that are adjacent in the data word. And the 32-P checksums will detect all 1-, 2-, and 3-bit faults regardless of the bit position. Beyond that is difficult to know without analysis tailored to the specific situation of the memory array you care about as well as the comparative frequency and patterns of different numbers of bit faults.

Wednesday, February 28, 2024

Comparative speeds for different Checksum & CRC implementations

Comparative speeds for different Checksum & CRC implementations.

For more information see my new book:  Understanding Checksums and Cyclic Redundancy Checks -- which includes downloadable open source example implementations for the checksums mentioned below.  This is a preview of section 12.1 from that book.

Monday, February 19, 2024

Book: Understanding Checksums and Cyclic Redundancy Checks


Amazon e-book: https://www.amazon.com/dp/B0CVXWDZ99   (Also available from country-specific Amazon web sites for:  UK, DE, FR, ES, IT, NL, JP, BR, CA, MX, AU, IN).  

Updated, permanent post with more info here: https://checksumcrc.blogspot.com/p/book.html


Monday, January 1, 2024

Explanation of CRC Operation: Polynomial Division, Shift Register, and Comparison (video)

 Here is a video explaining the mechanics of Cyclic Redundancy Check computations, including:

  • An example of the polynomial division approach
  • An example of the XOR-feedback shift register approach
  • A step-by-step comparison of how the polynomial division and shift register approaches are actually doing exactly the same computation.
If you've ever wondered why it is that a shift register and polynomial division end up with the same result, this video explains it.

FULL VIDEO:   YouTube | Archive.org

The full video consists of these four parts: (YouTube Play List):
  • Part 1: Example CRC Division to compute check value (YouTube 4:25)
  • Part 2: Example CRC Validation via Polynomial Division (YouTube 3:30)
  • Part 3: Example CRC Computation via Hardware Shift Register (YouTube 8:10)
  • Part 4: Comparison of CRC Polynomial Division and Hardware Shift Register (YouTube 6:21)




Why To Avoid Hash Algorithms If What You Really Need Is A Checksum

Sometimes we hear that someone plans to use a hash function for fault detection instead of a checksum. This is probably not the best idea, b...