AI alignment is not enough

Beyond alignment, pause and shut down: what helps against all X-risks?

Jul 18, 2023

The danger that much much smarter than human AGI poses for the survival of our dear species is so big that we should not rely on one approach only. Rather, let a thousand flowers of defense against the threat bloom.

But first, let's take note and appreciate EliezerYudkowsky's open letter in Time. Eliezer is one of the most influential intellectuals in the world (among serious people, not the intellectual aristocracy). Among his contributions are his writings on rationality, and his alarm about the X-risk (an X=existential risk for the human species) artificial general intelligence (AGI) poses. Here are the most alarming parts of his letter

“Many researchers steeped in these issues, including myself, expect that the most likely result of building a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die. Not as in “maybe possibly some remote chance,” but as in “that is the obvious thing that would happen.” […]
The likely result of humanity facing down an opposed superhuman intelligence is a total loss. Valid metaphors include “a 10-year-old trying to play chess against Stockfish 15”, “the 11th century trying to fight the 21st century,” and “Australopithecus trying to fight Homo sapiens“.

This is a serious problem. The favoured current answer among AI researchers is alignment. That is, the superhuman AGI will be aligned with human goals. Creating a system that is aligned with one's goals is super challenging, as we all know from economic policies or incentive systems for employees. We haven't figured out how to incentivize our political servants (people like presidents, prime ministers), the CEOs of the companies we own (as in 0.00000000000001%), our teachers or our students to do what maximizes joint surplus and/or what we want. So we are bad at alignment, as it is. Our track record with alignment therefore suggests that we should not rely on alignment alone when it comes to avert X-risk.

Eliezer is similarly worried that alignment won't work, or at least AI research is nowhere near successfully aligning AI. Hence, he writes

If somebody builds a too-powerful AI, under present conditions, I expect that every single member of the human species and all biological life on Earth dies shortly thereafter.
There’s no proposed plan for how we could do any such thing and survive. OpenAI’s openly declared intention is to make some future AI do our AI alignment homework. Just hearing that this is the plan ought to be enough to get any sensible person to panic. The other leading AI lab, DeepMind, has no plan at all.

— Maybe that they have no plan is an overstatement, to me the situation seems like a medieval port city pointing to research on “antibiotics” in order to combat the plague. No wonder that to this proposal, wise women and men respond with a more practical and immediate remedy a pause, or as Eliezer does a shutdown. Basically a quarantine. Don't let in this thing that you don't understand, that you cannot hope to control, that does not care or know that you are alive, and that can totally overwhelm you.

Eliezer concludes

Shut it all down.
We are not ready. We are not on track to be significantly readier in the foreseeable future. If we go ahead on this everyone will die, including children who did not choose this and did not do anything wrong.

Alignment may not work, and Eliezer's plan may not work. A indefinite world-wide moratorium.

Also, maybe AI is not an X-risk, maybe “AI Will Save the World”, as Marc Andreessen argues. Now Andreessen is one smart cookie. (this is expression is for your amusement dear reader, it is meant as a fun understatement). So if Andreesen and Yudkowsky (and many others disagree), I think it is fair to say that there is uncertainty.

So I think we need to about things that can be done in addition to alignment, a pause, or a shut-down. (For present purposes, I refrain from taking a position on these options).

So the modest proposal is that: Let's make our species more resilient to all kinds of X-level threats. There are at least two big reasons why that is a good idea:

X-level threats are here to stay with us. So a general purpose "remedy” / “technology” / "preparedness would be neat.
The X-level threat that will get us, is likely not the one we see coming a long time ahead. After all, if you see a threat coming, you get afraid and take action.1

Of course, this raises the question, what makes our species more resilient to all kinds of X-level threats?

Resilience against unspecified threats necessarily means rather generic/general things. To put it simply, we must raise species capacity. Let us define species capacity analogously to state capacity. Just, it is not limited to the state/government but includes all of a society's/country's capacity, and then we look at the entire species, not just humans living in a particular country. In what dimensions species capacity could most fruitfully be raised is a question of its own. As an example, take Covid. Quickly coming up with vaccines and mass producing them displayed our species’ capacity. And clearly, in that dimension our capacity was much higher than before. Similarly, people across the globe have an understanding of the germ theory of diseases, and that understanding and voluntary action or informed compliance with mandates was really helpful. Another example of species capacity. We both have a theory of disease, and ordinary people use it to effectively respond to novel threats.

In this blog, a lot of my writing will be motivated by that idea and to examine what contributes. Maybe you, whether you are a kindergarten teacher or a venture capitalist, also contribute to species capacity.

Even if you see a lethal threat coming it might kill you if there is nothing promising that you can do. Yudkowsky would presumably point out that this is pretty much the situation wer are in with AI X-risk. We see it coming, we can take lots of measures, but nothing seems really promising.

Schonger Substack

Discussion about this post

Ready for more?