Skip to content

Lesson 01 - Research Orientation, GF(2), and the Hamming Code

Prerequisites And Diagnostic Checks

Before starting, answer these quickly:

  1. In a stabilizer code, what does a parity check measure?
  2. Why does a surface-code check feel hardware-local on a 2D grid?
  3. What does it mean for a binary vector to satisfy \(Hc^T=0\)?

If the third question is still fuzzy, this lesson should be taught slowly. If it is easy, compress the GF(2) arithmetic and spend more time on the Hamming code.

Concrete Motivation From Conjecture 3

Conjecture 3 is about a mismatch between a code and a machine.

The code side is a Tanner graph \(T_n\): qubits and checks are connected by the parity-check relations of a QLDPC code. The hardware side is a graph \(\mathfrak G\): physical qubits are connected only where native two-qubit gates are allowed. For surface codes those two graphs are geometrically compatible. For good expander-style QLDPC codes, they are not.

The intended lower-bound story is:

good QLDPC parameters -> expansion -> cross-cut syndrome demand -> small 2D hardware cut -> depth lower bound.

This first lesson starts at the algebraic bottom of that chain. Before discussing expansion or hardware cuts, we need the language of binary linear codes.

GF(2) Arithmetic As Algebra

GF(2) is the field with two elements, 0 and 1.

Addition is XOR:

  • \(0+0=0\)
  • \(0+1=1\)
  • \(1+0=1\)
  • \(1+1=0\)

Multiplication is the usual Boolean AND:

  • \(0\cdot 0=0\)
  • \(0\cdot 1=0\)
  • \(1\cdot 0=0\)
  • \(1\cdot 1=1\)

Two features matter constantly:

  • subtraction is the same as addition, because \(-1=1\) in GF(2)
  • parity checks are linear equations over GF(2)

So a binary parity check like "bits 1, 3, and 4 have even parity" is the equation

\[ c_1+c_3+c_4=0 \]

over GF(2). This is exactly the algebra behind syndrome extraction.

Vector Spaces Over GF(2)

The space \(\mathrm{GF}(2)^n\) is the set of all length-\(n\) bit strings. Addition is bitwise XOR.

For example, in \(\mathrm{GF}(2)^3\),

\[ (1,0,1)+(0,1,1)=(1,1,0). \]

A subspace is a set of vectors closed under XOR. A linear code is usually such a subspace. If a matrix \(H\) records parity checks, then the code is

\[ C=\ker(H)=\{c\in \mathrm{GF}(2)^n : Hc^T=0\}. \]

This is the first important translation:

  • a codeword is a vector that satisfies all parity checks
  • a syndrome is \(Hc^T\)
  • an error is detected when the syndrome is nonzero

Worked Example Before Abstraction: The [7,4,3] Hamming Code

The [7,4,3] Hamming code encodes \(4\) information bits into \(7\) physical bits and has distance \(3\).

A common parity-check matrix is built by using the nonzero binary columns of length \(3\):

\[ H= \begin{bmatrix} 1&0&1&0&1&0&1\\ 0&1&1&0&0&1&1\\ 0&0&0&1&1&1&1 \end{bmatrix}. \]

A valid codeword \(c=(c_1,\ldots,c_7)\) satisfies \(Hc^T=0\). Each row is one parity equation. For example, the first row says

\[ c_1+c_3+c_5+c_7=0. \]

There are \(7\) bits and \(3\) independent parity checks, so the dimension is \(k=7-3=4\). That gives \(2^4=16\) codewords.

Why is the distance \(3\)? In a linear code, the distance is the minimum Hamming weight of a nonzero codeword. Equivalently, for a parity-check matrix, it is the smallest number of columns of \(H\) that sum to zero. No single nonzero column is zero, and no two columns are equal, so no one-column or two-column dependence exists. But some triples sum to zero. Hence \(d=3\).

Generator Matrix Versus Parity-Check Matrix

The parity-check matrix \(H\) describes the constraints. A generator matrix \(G\) describes how to produce codewords from messages:

\[ c=mG. \]

The compatibility condition is

\[ HG^T=0. \]

That equation says every generated codeword satisfies every parity check.

Conjecture Linkage

In QLDPC research, matrices like \(H\) become large and sparse. Their row weights and column weights stay bounded, but their graph structure becomes highly connected. The Hamming code is tiny, but it already contains the main grammar:

  • columns are physical coordinates
  • rows are checks
  • \(Hc^T\) is a syndrome
  • distance is the minimum undetected nonzero pattern

The later QLDPC lessons replace this small matrix with families of sparse matrices whose Tanner graphs expand.

What This Does And Does Not Prove

This lesson proves no hardware lower bound. It only builds the algebraic language. The lower bound needs graph expansion and hardware cut capacity, which come later.

It also does not say that all good QLDPC codes are just larger Hamming codes. The Hamming code is a teaching example, not the model construction.

Active Recall

  • Why is \(1+1=0\) the algebraic reason parity checks are linear?
  • Explain \(Hc^T=0\) in both matrix language and stabilizer-syndrome language.
  • Why does the [7,4,3] Hamming code have dimension \(4\)?
  • Why does "three dependent columns" correspond to distance \(3\)?

Next-Step Handoff

Next we keep the Hamming example but abstract it: systematic generator matrices, parity-check matrices, dual codes, code parameters, puncturing, and shortening.