Negative Result: Claude Code Did Not Make An Improvement To Binary Code Packing

TLDR: There's a problem in combinatorics that tries to find a maximally-large set of bit strings of length N such that all chosen bit strings have a Hamming distance of at least D from each other. I spent a month, 1.5 billion tokens, and a small amount of brain sweat trying to make any progress on this problem, and got nowhere.

A while back, I saw the Youtube video the 7-segment display is suboptimal, which despite the name is ultimately about exactly this problem. Choosing a subset of bit strings in math is called a "binary code", since it is useful when dealing with information transmission and error correction. The problem of trying to cram as many "words" (the bit strings you choose to form a code with), while maintaining a minimum Hamming distance (the number of bit positions that differ between the two), is called "binary code packing", but it seems the literature generally just calls it "binary codes of length N and distance D".

You can also see this problem given some more explanation in this SMBC webcomic which thinks of it as packing spheres in an N-dimensional hypercube. If that's your jam I guess

There is a number, often written as A(n, d), that tells you the maximum number of words in your code given these constraints. You can find a (reasonably current) table of values for different bit sizes and distances compiled by Andries Brouwer here. Note that for some instances, we have an exact answer: A(9, 4) is equal to 20. You can find 20 bit strings of length 9 such that no two strings have a Hamming distance less than 4. But, you cannot do the same with 21 strings: 20 is the maximum.

But other cells in the table don't give an exact value, and instead a range: For instance, A(17, 6) is at least 258, and no more than 340, but we haven't proven exactly where it falls.

After watching the above Youtube video, I got interested in this problem. It's deceptively easy: a lot of combinatorics makes you think "couldn't you just brute-force it?" After all, there are only ~131K bit strings of length 17. But you're not checking individual bit strings: you're checking combinations of them, on the order of 258 of them. So you actually would have to brute-force 131072! / ((131072 - 258)! * 258!) which is a very large number: nearly a thousand digits long.

So, if you can't use brute-force, what else can you do? Most of the codes found use clever mathematical constructions, around groups, symmetries, automorphisms, and other words I pretend to understand. I figured that this might be worth trying out these newfangled LLM thingies on. After all:

I understand this problem (kinda)
I don't have the chops to make novel contributions myself
The answer, if we get one, can be verified easily enough: just check the Hamming distance for all the elements

I booted up Claude Code inside a Linux container (for sandboxing, which allows me to use "dangerously skip permissions" mode), gave it my initial prompt, and let it go. It did some initial lit review, came up with a batch of ideas to try, and then systematically failed to make any progress on any of them. It tried various heuristics, various groups, SAT solvers, SAT solvers w/ lots of added symmetry breaking and other clauses, Cliquer (to find graph cliques), tabu search on existing codes, swap-and-repair, and a bunch of other things I don't really know well enough to describe, let alone explain.

None of these made any progress. At several points, Claude Code would say something like "wow, this problem seems really hard. Maybe we should give up" and I had to basically be like "okay, but hypothetically, what could we try that might work" and then when it listed its ideas I said "cool, let's try those then".

I love having to trick my computer into doing the things I wanted it to do

Realistically: a lot of these problems have been open for decades. I wasn't expecting to be able to come up with anything, but I figured it was worth a shot.

"But wait!" I hear you say in the future, somehow. "It could be that current lower bound is optimal! You'd be throwing compute resources at an impossible task!" That is a good point. If we assume that there is no better code out there, we'd then want to prove it. Basically, instead of increasing the lower bound, we'd want to decrease the upper bound. And once they meet: boom, you have an exact answer.

The proofs behind existing upper bounds are actually pretty interesting: a lot come down to a statistical argument, where you construct a polynomial around the number of code words of each distance, and eventually bound the total number of code words that can fit w/o some dipping into lower distances. You can also add more constraints to further bound it, but this gets trickier and harder to trust.

I actually did try having Claude Code work on this a bit, but gave up there too. A couple reasons: it's even more over-my-head than the rest of the problem. And worse, it's non-constructive: if we come up with a new set of constraints, and this lowers the upper bound, we don't have a certificate: we basically just need to publish the code, the results, and pray that there's not a bug in there somewhere. After watching Claude chase down bugs in its constraint solver, or fail to approach the existing upper bound, I gave up.

And remember that there is an exact answer here. It would take an infeasible amount of time, but you could brute-force search for it. So the lower bound is too low, or the upper bound is too high: there is progress to be made either way. It's just that lowering the upper bound is a trickier proposition since it can't be easily checked, so it is possible that this was doomed from the start. If you go that route, you might want to have some kind of automated certificate or formalization.

So overall, an interesting exercise, but no new mathematics. But I think this is worth keeping in mind when you see results like "Erdos Problem XYZ was solved by AI" or "AI just solved a 45,000 year old combinatorics problem". There is genuinely interesting work in the space, and some of it is impressive on its own. But I have to imagine that some of it is the case of "try a method on 1000 problems, publish the handful of hits".

If you'd like to take a look at the "lab notebook" that Claude Code kept updating during iteration, you can find it at this gist.

I was able to basically poke and prod at this a couple times a day for a month, and it managed to try out a bunch of existing techniques to each instance of the problem. That can be good for "smoothing out" the literature, but I'm skeptical it'll lead to massive advancements in the field. And I'd exercise caution in looking at any single result in isolation: people generally don't publish their negative results (they should!), so you're just left with the highlights. It's not lying, it's just polishing. And sometimes, you polish something so much it becomes a mirror, and it distorts how you see the world (note to self write a closing sentence that's less schmaltzy)