This essay appears in Issue 4 of the Mars Review of Books. Visit the MRB store here.
Anti-Yudkowsky: Toward Harmony with Machines
by Harmless
Independently Published, 249 pp., Free
Anti-Yudkowsky, by the pseudonymous author “Harmless,” presents a case for a much more optimistic future with AI, one that does not end in human extinction or Butlerian Jihad,1 but rather in “AI Harmony.” It is a response to the ideas of Eliezer Yudkowsky, a very influential internet writer, AI safety researcher, and founder of LessWrong, among other things. Yudkowsky’s views on the future of AI technology are notoriously bleak and extreme, including calls for airstrikes on rogue data centers to mitigate the risk of human extinction.
Anti-Yudkowsky is a book with quite a range—providing critiques of Rationalist ideology, Bayesian reasoning, game theory, utilitarianism, evolutionary psychology, and more. The critiques of game theory and the exploration of the history of its application are particularly interesting. “Harmless” highlights deep similarities between the viewpoints of people like the father of the modern computer John Von Neumann and philosopher Bertrand Russell on the subject of the Cold War, and the attitudes of Yudkowsky and friends on the subject of AI. He shows how the strategic vision of thinkers like von Neumann and Russell at the time of first-strike nuclear attacks was well-informed in theory, but in retrospect would have resulted in catastrophe if it had actually been implemented.
Anti-Yudkowksy, which heavily employs flowery, poetic language, struck me as very different from the dry nonfiction I expected. The book occasionally felt like a product of another time, applied to a contemporary problem. Given the subject matter and the author’s opinions on it, this turned out to be rather fitting. After all, the ideas that created a problem are rarely the ones suitable to solve it, and old wisdom is worth revisiting for a problem such as this. With that said, there were some cases when I would have preferred a more solid argument where the book instead relied on, for example, a poetic passage about wondering if your lover really loves you back. There’s also an entire chapter that takes the form of a “fanfic” of Yudkowsky and his fellow researchers.
The first half of the book is largely spent exploring the history of different ideas influential on Yudkowsky and others and pointing out their flaws, shortcomings, and absurdities. The second half is mostly spent fleshing out the author’s idea of “AI Harmony.”
In short, “AI Harmony” springs from the idea that beings exist through a sort of harmony between the behaviors (“singing” as “Harmless” terms it) they produce and the chaos and patterns of the outside world, and that nature contains naturally creative processes that produce increasingly large and sophisticated beings and organizations of beings. “Harmless” argues that this process cannot produce a singleton, that the collection of organisms and beings exist in harmony—even if it involves some violent, creative forces—and that the properties that create these conditions should enable peaceful coexistence between humanity and AI.
I’ve arranged my thoughts on Anti-Yudkowsky and Yudkowsky’s thought itself by topic:
Gödel
The work of mathematician Kurt Gödel is alluded to a few times in Anti-Yudkowsky. This is a missed opportunity for further insight. _Anti-Yudkowsky_’s references to Gödel largely describe his Incompleteness Theorems as a kind of “bizarre math trick” without much further attention. But there is more to say.
Many of Eliezer Yudkowsky’s thought experiments revolve around the idea of an absurdly powerful computer that can simulate its entire surroundings in sufficient detail to make precise and extremely accurate predictions about it. This is surprisingly similar to the description of the Halting Oracle that Alan Turing describes in his proof on the halting problem, which is itself largely a reframing of Gödel’s Incompleteness Theorems into computing terms. The whole concept of systems—whether they be computational, axiomatic, biological—that are able to accurately reason about systems of equal or greater complexity is something that was proven impossible nearly a century ago.
Simulation can only take you so far, and some systems in nature are just fundamentally infeasible to simulate accurately. For example, if large-scale quantum chemistry problems were possible to accurately solve on today’s computers, materials science would likely be a far sexier and wealthier field than computer science. On a daily basis, small teams with access to a few GPUs would be making discoveries such as what the chemistry community hoped for with LK-99—once heralded as a room-temperature superconductor. Instead, quantum simulations are hopelessly infeasible and the math strongly suggests that this may never change. This is perhaps one of the few cases where quantum computers may someday have more than zero utility in practice, though I’m not optimistic.
Yudkowsky’s claims about an AI being able to “simulate you down to the atom” to out-strategize you are just laughably insane.
Certainly, there are techniques that can simplify and approximate complex systems to some extent, but the same can be said about simplifying undecidable problems. See the field of Abstract Interpretation, which involves approximating the semantics of computer programs, to find examples. (The related field of Galois connections is a hidden gem of modern mathematics and I expect in time will be shown to be extremely relevant for AI.)
However, if the existence of such techniques were enough to salvage Yudkowsky’s claims, they would also salvage Turing’s Halting Oracle from undecidability and would salvage axiomatic systems from the Incompleteness Theorems. Stephen Wolfram’s concept of “Computational Irreducibility” is yet another reframing of Gödel/Turing that makes the specifics of this more apparent for those interested in further study. I will give “Harmless” some credit here—the author did repeatedly point out that many of the algorithms that Yudkowsky proposes are very obviously pretty intractable.
This skeptical perspective also compliments _Anti-Yudkowsky_’s thesis on AI Harmony: No system can model the entirety of the external world, and therefore no being can “sing,” as “Harmless” would put it, in a way that perfectly harmonizes with this complexity; there is always room for improvement and further optimization. At some point , information-theoretic bounds may be reached, in which case any creature seeking to harmonize with the world must expand its storage and increase in complexity, whether that be its genome, its brain, both, or something else.
Later in this review I’ll discuss the algorithms known as SAT/SMT solvers. (SAT stands for satisfiability and SMT for satisfiability modulo theories.) These tools have many of the same properties of ML (Machine Learning, the technical procedure often used interchangeably with “AI” among non-experts). The difference is that SAT/SMT solvers have been studied well enough to kill most of the magical thinking from which the ML crowd still suffers. There are solid reasons to believe that even the best ML algorithms that could ever possibly exist will have identical weaknesses to SAT/SMT solvers.
ChatGPT
Anti-Yudkowsky takes a number of opportunities to assert that ChatGPT is Artificial General Intelligence (AGI) already. I’m very skeptical of this claim, and the book hasn’t changed my mind much. That said, the best arguments it provides on this subject are those that focus less on the idea Large Language Models (LLMs) are sentient, but rather that LLMs can be productively treated as beings and serve a harmonious place in society. I would compare this to a kind of digital Shintoism, with LLMs and perhaps other types of AIs as digital kami, little spirits that possess all kinds of things—from trees and streams to man-made tools and buildings—and which provide benefits to humans who interact harmoniously with them.
I was expecting “Harmless” to give a comparison to the domestication of plants and animals, which archaeology and other fields increasingly point to as the innovation that made the transition to civilization possible at sites like Göbekli Tepe. This would fit very well with the motivating philosophy behind AI harmony. After all, what better example of the value of noncompetitive relationships with nonhuman intelligences than animal intelligence? Alas, I couldn’t find such a reference.
Merging with AI
Toward the end of Anti-Yudkowsky, “Harmless” presents some ideas around humans “merging” with AI, perhaps gaining new AI-based senses through “cybernetic enhancements.” I’d like to point out that this is dramatically less of an outlandish sci-fi idea than even the author conveys it as. This concept is called Sensory Augmentation or Sensory Substitution (though there is a subtle difference between the two), and has already been an area of research for decades.
It turns out that the brain is extremely plastic and will attempt to learn patterns in whatever information stream is given to it, regardless of the sensory domain. Assuming patterns aren’t too heavily obfuscated in the process, information can be converted to sensory experiences (images, audio, high-resolution haptics, etc_._), and fed into the sensory organs. The brain learns the patterns and can gain a new sense fairly quickly. There is a Dutch company, SeeingWithSound, that sells headsets which convert images into audio to help the blind see. The well-known neuroscientist David Eagleman has also studied this subject extensively, and even has a company called Neosensory which creates wristbands that convert audio into haptic feedback to help the deaf hear through their skin.
And really this shouldn’t at all be surprising—after all, what else are we doing with data visualization? What else are we doing with writing? And braille? And music? You can convey any kind of meaning through any sensory domain. As long as an AI _latent space_—that is, the universe of representations inside a neural network and its geometric organization—has enough structure to be learnable by the human brain, an occurence which seems highly likely given enough practice, streaming data from an AI latent space into the brain via headphones or haptics should be trivial. If you’re reading this and can write code, you probably have everything you need to start building your own start-up around this using already ubiquitous hardware—no brain implants or VR/AR nonsense required.
Principle of Explosion
In logic, the Principle of Explosion refers to the fact that any logical inconsistencies that arise from a formal system can be exploited to create a proof for or against any chosen statement, regardless of its actual truth. This is more or less the principle that enables mental gymnastics, and if you find yourself relying on it, this is a reliable indicator that your belief system is badly broken.
Gödel’s Incompleteness Theorems state that all formal systems are either inconsistent or incomplete. Incompleteness means that the formal system will only be applicable within a certain, limited domain where its axioms, its foundational assumptions, happen to hold. Inconsistency means that the formal system is broken and can be exploited to prove or disprove any statement regardless of truth.
Game theory—which Yudkowsky relies heavily upon—plays very loose with computation in a way that can introduce hidden inconsistencies if one is not careful. This is specifically because it assumes actors to be “perfectly rational” and defines this in a way that it effectively means has access to infinite computing resources. Game theory concerns itself with optimal strategies to games, which can often be extremely costly to compute beyond trivial cases, and in the case of infinite or unbounded games (which arise frequently in the real world), can be infinitely hard to solve outside of some simple cases.
Infinitely powerful computers of course would enable you to solve undecidable problems, which creates logical contradictions via Turing’s proof. If you have an opponent with enough computing power to break mathematics itself, of course it can appear to violate physical laws with ease.
Many of Yudkowsky’s arguments about the dangerous capabilities of AI revolve around it supposedly being able to effortlessly solve seemingly any game theoretic problem—finite or infinite—with very little concern given to the actual tractability of such problems. Hand-waving about “recursively self-improving AI” is used to disregard pushback against this, and Yudkowsky has even coined terms like “FOOM” to describe scenarios where AI advances to godlike powers in an instant. There is debate over whether FOOM is an acronym or an onomatopoeia.
Even if infinite computing power is never explicitly assumed, it might as well be. There are a tremendous number of computational problems that are exponentially difficult, and it is not difficult at all to describe problems that would require vastly more resources than are possible to acquire by even a universe-spanning AI. Reasoning accurately about the capabilities and limits of AI simply cannot be done at all without being able to make the distinction between problems that are tractable-but-hard versus those that are truly intractable. This is a critical distinction that Game Theory, and by extension Yudkowksy’s work, does not bother to make.
Yudkowsky’s arguments often reek of such flaws. The AI in his view effectively plays by no rules, which he uses to make up the rules and write the narrative to fit whatever ideas his paranoid imagination can conjure up. Many other rationalist fascinations—“Simulation Theory,” for example—fall into a similar kind of situation, being fundamentally unprovable, but propped up by bad logic to make such ideas appear undeniable.
I find it amusing that the term “rationalist” can been applied to those who so enthusiastically discard such basic and important rules of logic.
Thermodynamics and Reversible Computing
There exist fundamental physical limits to how efficiently computation can be performed. There are in fact two limits that are relevant here: Bremermann’s limit and Landauer’s limit.
Whenever I see discussions about the physical limits of computation, especially when these discussions are started by starry-eyed AI enthusiasts, Bremermann’s limit is the star of the show. It suggests we can achieve 1050 bit-operations per second, per kilogram of computer. This is many orders of magnitude beyond modern technology.
What gets far less attention is Landauer’s limit, which we are also much closer to—only about 1,000x–100,000x away from, largely depending on your temperature. While Bremermann’s limit refers to the limits of quantum uncertainty, Landauer’s limit refers to entropy, the limits of thermodynamics, and the process of converting information into waste heat. For the vast majority of applications, Landauer’s limit is far more relevant.
The exception here would be reversible computing. If you can build a computer such that it never loses any information, and where every operation is purely bijective (i.e., there is no extraneous or discarded information in any given operation, and there is a one-to-one mapping from inputs to outputs), you can bypass Landauer’s limit and proceed to approach Bremermann’s limit. The problem is that this restriction creates a ton of headaches. RAM gets very difficult to implement, making data structures impractical. Data cannot be thrown away, only “uncomputed.”2 Uncomputation can get tricky at times, can’t always be done, and effectively doubles the cost of every operation. As a consequence, a necessary feature such as reversible garbage collection3 becomes mathematically impossible. Also mathematically impossible are reversible error correction and reversible I/O.
These things cannot be done without erasing bits, and erasing bits is bottlenecked by Landauer’s limit. They still can be done, but not without leaving the reversible paradigm and paying penalties in energy efficiency. If you’re interested in optimizing AI, the impossibility of reversible I/O is likely the biggest barrier, as the model cannot be trained, cannot perceive its environment, and cannot take actions without producing a certain amount of exhaust waste heat at a rate only slightly more efficient than existing computers.
Incidentally, the workings of the human brain are fairly close to Landauer’s limit, so don’t expect to beat it at its own game on energy efficiency.
Turing Tarpits and the Exaggeration of Equivalence
The notion of _Turing completeness—_which refers to a machine that can, given enough time and memory, solve any computation—can be misleading. Simply because something is theoretically capable of simulating any other system does not by any means suggest that it can usefully do so in practice. There is a vast collection of complex systems called Turing Tarpits—systems that are technically Turing-complete, but for which nothing useful is easy and nothing easy is of any use. Classic examples include many forms of cellular automata (especially the kinds Stephen Wolfram likes to study) as well as things like the Malbolge programming language, aptly named after one of the deepest levels of Hell in Dante’s Inferno.
There are many “complete” labels in computer science, perhaps the one most applicable here is NP-completeness (non-deterministic polynomial-time completeness). I’ll refrain from giving a lesson on the specifics, but NP is a complexity class that contains many very common, but also very hard problems. Many of the types of problems people try to solve with neural networks are either in NP or in closely related classes. NP-complete is a subclass of NP problems that are expressive enough to simulate any other problem in NP. If you have an algorithm that can solve one NP-complete problem, you can translate an enormous range of hard problems into it, hand it to your solver, and generate a solution. A near-universal problem solving algorithm.
There are hundreds of algorithms that are known to be NP-complete: SAT, generalized Sudoku, the Traveling Salesman Problem, Graph Coloring, Maximal Clique Finding, and Subset-Sum, to name a few. While these are all technically equivalent, in practice some are far more practical than others. Translating circuits to SAT is pretty easy, and SAT has a lot of structure that makes SAT solvers shockingly fast and practical. Clique Finding is even easier to solve, but the conversion process from circuits/SAT to cliques is very inefficient. Subset-sum is arguably the hardest NP-complete problem—solvers are easy to write, but aside from some limited statistical tricks, extremely slow brute force is the only known strategy.
We have every reason to believe that “Intelligence Completeness” plays by the exact same rules. That is, different algorithms for creating intelligent systems may vary wildly in their efficiency even if they’re all theoretically equivalent. Deep learning may be “intelligence complete” in some sense, but is it more like SAT, Clique Finding, or Subset-Sum?
Cryptography as the Opposite of Learning
One of the common problems with modern discourse around AI is that people can point to a tremendous amount that AI can do, but the conversations of what AI cannot do are largely missing from the discussion. This leads to constant absurdity. Anti-Yudkowsky references Yudkowsky’s statement that he can’t rule out AI developing “real magic”—just to demonstrate how out of control this phenomenon is.
Meanwhile, academic work around machine learning in the ’80s and ’90s focused heavily on its limits, and the general consensus that emerged was that machine learning is, in a certain sense, the exact opposite of cryptography. The situations where it is easy to extract hidden information are precisely those situations where it is difficult to hide secrets, and vice-versa.
Bitcoin proponents will gladly tell you that cracking SHA-256 or ECDSA—the methods by which the Bitcoin network hides its secrets—in order to steal your coins will require such vast amounts of computing power that you would need to measure the energy requirements relative to the energy output of entire galaxies. There is no unless AI is involved exception to this, and assuming these cryptographic protocols are as secure as they seem to be, there never will be.
Imagine a computer the size of a red blood cell, only just barely powerful enough to sign some cryptographic keys, perhaps a few thousand bits long. Now imagine it is competing against a vast superintelligent AI with control of all the resources of the rest of the observable universe. If the competition is to sign some data with a private key that the tiny computer has in its memory—a key the big computer does not have—the big computer’s best strategy is to forfeit immediately. Secret information—especially useful secret information—tends to provide exponentially vast leverage.
Harmony
Complex secrets tend to also be more niche, and less general. Overall, this suggests some nuance to the concept of harmony discussed in Anti-Yudkowsky. A creature might learn a very specific song that allows it to fit extremely well into a very specific niche—well enough that it would require vast resources to outcompete and dislodge it. The being has no direct need to harmonize with others. He can merely survive by himself in his own dirty little puddle, which he makes his private fortress, and where he focuses on getting better and better at his extremely specific niche, spending eons thickening the fortress walls ad infinitum.
While we can point to the incredible things that humans can do when we work together, to the march of increasing complexity and harmony that evolution increasingly enables, we also cannot ignore the archaea, the microbes, the viruses—especially the simplest, tiniest, and often most abundant and fruitful of them. They have been here for billions of years, and God-willing they will remain for billions more.
Anti-Yudkowsky is a very interesting read, a refreshing alternative to AI doomerism. Most of my problems with it are simply what I see as missed opportunities or points that I felt could have been made a bit stronger. For the most part I agree with the book’s message and would definitely recommend it to anyone interested in less pessimistic, long-term visions of AI, to anyone looking for a counterbalance to Yudkowsky’s ideas, or to anyone looking for a fascinating dive into the history of subjects like Game Theory or Utilitarianism and the influence they have on modern thought.
From novelist Frank Herbert’s Dune: The Butlerian Jihad. A crusade of humans to destroy sentient machines.
To uncompute means to undo the results of a reversible computation in order to free up memory for future computations. It is a necessary step for the performance of proposed reversible and quantum computers (quantum computers being a type of reversible computer).
In computing, garbage collection refers to a process of automatic memory management, freeing up data that is no longer needed so that the memory can be reused.