Version v1, snapshot at 2025-09-17

An Attempt to Explain my AI Risk Explainer Attempt

I spent much of 2023-2025 writing an explainer about AI risk, with what I hoped would be a unique twist. However, I discovered foundational problems with the unique twist that led me to abandon the work.

(Or you can skip ahead to my lists of AI risk explainers that actually exist.)

What was I writing about?

Some people are worried that AI will have the capacity to cause human extinction (sometimes called AI risk, AI doom, or AI safety¹). Other people disagree. I wanted to write something to explain why the worriers are worried.

Retrospectively, I was aiming to serve a number of different target audiences:

Some folks have never heard of AI risk, and want a high level overview of the concept.
Some folks reflexively reject the premise of AI risk as absurd. Even if I didn’t convince them that AI risk is a significant possibility, I would have liked to move them to a concrete stance like “AI risk is incredibly unlikely, because of reasons X, Y, and Z”.
Some folks have concrete, detailed disagreements with AI risk scenarios. I ideally wanted to present counter-arguments to these disagreements, but sometimes these disagreements are not currently resolvable. For example, is consciousness compatible with physicalism? We can’t currently resolve this, but people have strong opinions about it, and it seems like it is relevant to some people’s estimation of AI risk. Even in these cases, I wanted to get me and the reader to a common crux where we disagreed, where we could begrudgingly see that our disagreement could resolve one way or another if only we had additional evidence or understanding.
I was aiming at an audience similar to my software engineering² colleagues (my job when I was starting to write). That is, I tried to cater to an audience with interest in technical arguments and some level of university-level general knowledge.

I also meant for this explainer to be static and evergreen:

I tried to make sure I didn’t rely on specific current events or properties (like whether leading AIs are LLMs, like they are at time of writing). I was trying to argue that AI risk is a deep structural problem, robust to changes in specific details. I included contemporary details as set dressing or emphasis, but avoided basing fundamentals on them.
The evergreen focus meant that I omitted any mention of what we should do, excising any call to action. Partly this is because AI risk response is not currently static and partly because I don’t actually know what action to call readers to. If I knew what we should be doing, I would probably be doing that instead of writing.
Practically speaking, I didn’t want to commit to doing maintenance on a long and detailed post³.

There are already many AI risk explainers: what was the unique thing this project was trying to do?

There are two related problems with explaining AI risk through written formats, which this post calls the detail dilemma:

People do not want to read long winded explainers full of detailed arguments. Too long, didn’t read.
Readers of short explainers can easily come up with counter-arguments (“an AI is doing bad things? Why can’t we just turn the computer off?”), but because the work is short, it does not have room to address every detail and possible counter-argument. If a reader starts out skeptical, the failure to address their specific arguments may be enough to cause them to reject the entire concept of AI risk as unfounded.
Adding enough detail to address everyone’s potential arguments brings us back to the first problem with long explainers…

My approach to threading the detail dilemma was to craft a new writing format with expandable levels of detail. As an brief completely serious example:

🔗~152 words Cats should rule the earth.

🔗~39 words Humans have done a poor job of running the Earth; cats could hardly do a worse job.

(Here I would tally up the sins of the humans; on the other hand, we could only list such cat sins such as “broke the vase”, “barfed on the sofa”, and “caused me to trip while carrying something heavy”.)

🔗~41 words Cats already believe they rule the earth, so they will be natural rulers.

(I would include anecdotes here, like demanding to be let into a room and then immediately demanding to be let out of the same room.)
(Any reports that cats do not possess sufficient authority to be fed on demand are slander.)

🔗~22 words Cats are too cute, and therefore deserve to set all aspects of government policy.

(I would include photos here, to remind people of cat cuteness.)
(The link between cuteness and policy making ability should be obvious.)

↥ Cats should rule the earth.

🔗~81 words Cats have the means to rule the earth.

🔗~9 words Cats have already started engaging in biological warfare in support of their take over.

(Cats already spread Toxoplasmosis. Maybe it changes human behavior!)

🔗~21 words Cats have already started using computer interfaces.

(I would include photos of cats laying on keyboards, and the resulting chat messages. Our best pstpstpscientists are on learning Python.)

🔗~18 words Cats already regularly draw human blood.

(I would include videos of cat bath times. Those calm cats are collaborators, don’t pay attention to them.)

A longer example with additional effort would be my argument about why AI War (conflict between AIs) would not prevent AI risk ⁴.

I figured a format built around nested foldouts (henceforth referred to as the foldout format) would have some nice properties:

1. A Reader Driven Deep Dive

Readers could choose where they wanted to dive into the details (a sort of choose your own adventure); they could skip arguments if we agreed, and dive deep where we disagreed.

My understanding is that skilled readers already do this with long texts (like a PhD thesis): why read the entire thing if only one part is relevant to your interests? Instead of passively following the text, one reads more actively, using existing sign posts like headings and paragraph breaks to skip to the most interesting material.

However, not everyone is a skilled reader. Perhaps it is good if the writing format itself forces the reader into an active reading stance. I could argue that the foldout writing system does this, since the reader has to actively decide whether to follow any particular line of thought into a foldout.

2. A Structured Deep Dive

Writing is usually linear, but we readers need to recover the structure of the argument the writer had in mind from this linear stream: “Ok, this sentence sure seems like the local thesis. This paragraph is addressing an argument about the thesis. This page is touching on some minor detail related to the thesis. This paragraph seems to be tying the thesis to a previou- oh, never mind, it’s actually making a new point.” Good writing makes this reconstruction by the reader natural and easy.

The foldout format makes this structure explicit, hiding subordinate arguments and details by default, and making it clear which content is related. As an example, a reader may be deep into my discussion “are centrally hosted AIs at a disadvantage relative to locally hosted AIs in a war setting?” This deep into the weeds the reader may realize they’ve lost the thread about how this ties into general AI risk. In a linear text, they would need to find the last heading, or some heading before that; heavens help you if the text is a book with sparse headings and obfuscatory chapter titles⁵. With the foldout format, it is easy to collapse parent foldouts until one sees a larger point like “multipolar AI is probably not stable”.

Structure also seems like an advantage over something like a RAG chatbot; a linear conversation can dive deep and answer specific questions, but it is not easy to keep track of the stack of topics in a conversation. Because of this, it seems like it is easy for a conversation to get lost in the details.

So in theory an explicit foldout structure makes it easier to comprehend a complicated argument.

3. A Comprehensive Deep Dive

Normal writing is subject to the detail dilemma: writers need to balance including details against losing readers with increasingly long-winded writing. Student writers need to be taught to “kill your darlings”; their clever turns of phrase, their devastating sub-sub-sub-arguments, their beautiful details need to be measured against whether they’re serving the reader. If the target audience is bored or confused, the detail has to go. Even if a single studious reader is served by a clarification, is is probably a mistake to include if only that reader is served by that aside.

The foldout format jettisons the need for each detail to serve all audiences, since each reader will decide for themselves how deep they want to go, allowing adding many more details than otherwise possible⁶. If properly written, one document could serve an audience looking for something short and punchy, but also serve audiences looking for a comprehensive deep dive with all applicable details⁷.

Why abandon the project?

There were a number of problems that cropped up:

People didn’t like the format

Many early draft readers explicitly wanted a more traditional long form writing format, which boiled down to removing the foldouts⁸. This isn’t fatal to the project, perhaps it just needed more work, but I heard only minimal feedback from people that liked the format⁹. Without such a positive signal, it is difficult to plow past the negative feedback.

My hypothesis is that the format is widely disliked, but there are a few people who really love this sort of thing, including me. I have a few ideas about why people generally don’t like the format:

Theory 1: Active reading is hard

I theorized earlier that the format tends to force readers into an active reading stance. However, “forcing” your readers to do anything is probably bad. Great writing entices readers to engage¹⁰. Being forced into active reading sounds much more adversarial, not something you want in the relationship between writer and reader.

An alternative frame is that “choices are bad”, and the foldout format floods the reader with choices around whether to open a foldout or not. Therefore, the format is putting stress on the reader, and should be considered bad.

Theory 2: Outlines are unpopular

The foldout format is naturally suited to outlines: each foldout is summarized by a short claim, backed up at length within the foldout, all structured within a hierarchy of foldouts. However, most of our communication is not through outlines, but through prose. We could argue about why, but it seems clear to me that people don’t like outlines, and I strongly suspect at least some of that dislike spills over to the foldout format, with at least one piece of feedback explicitly asking for more prose-like writing. It might be possible to write for the foldout format in a style closer to prose, but I haven’t put in the work to figure out how.

In retrospect, there already exists something similar to the foldout format, argument mapping. The argument mapping platform Kialo has a similar tree structure and mechanism to unfold high level arguments into details (as an example, see this argument about AI risk). However, argument mapping is a niche interest, even with a long history and luminary attention, including Engelbart in 1962, father of “the mother of all demos”. Engelbart’s computer mouse concept clearly succeeded, while for whatever reason argument mapping clearly didn’t. I think this comports cleanly with my notion that this idea really appeals to a small number of weirdos like me, but is otherwise broadly unpopular.

There were other problems with the format.

Kitchen Sink Syndrome

As mentioned previously, the foldout format makes it easy to include all possible detail. This is great for making a work comprehensive, but then I needed to write all those details up, which takes time. In the drive to be comprehensive, I included everything, which meant it took a lot of time to write things up, including details that might not matter to anyone. This was exacerbated because…

Writing Is Not My Thing

One major problem was that I don’t actually like writing¹¹. You would think I already knew this, since I have taken 2 multi-year hiatuses from blogging, but I figured that I could power through it. I thought the topic was important, and maybe I was creating something new in the fundamental field of writing! To be fair, these reasons were enough to motivate me to regularly write for the last 2 years, and quitting my job¹² gave me the time, but as it turns out 2 years was not enough time to finish the writing I wanted to do.

Phone Mode

Another piece of feedback I received multiple times regarded how the format handles phones. The central problem is that the format concerns itself with deeply hierarchical information. A natural way to display hierarchy is to use indentation, but the indents need to be large to be legible. Modern phone screens are narrow, which means deeply nested legible indents will quickly eat up all available horizontal space, in some cases pushing details off the screen¹³. The solution I eventually settled on was to simply have each foldout take over the screen to maximize real estate, which also had a sweet and pleasingly simple implementation (Github).

Unfortunately, phone beta readers hated the full screen take over. Before I ended the project, I was seriously considering telling phone readers to use a larger screen, perhaps even blocking them instead of letting them wade through a sub-optimal reading experience. On the other hand, winning in the marketplace of ideas means meeting people where they are, and the people are on their phones.

What a cruel world. A major perk to abandoning the project is not having to resolve this problem.

Lack of expected feedback

I was surprisingly disheartened by the lack of feedback from the wider AI risk community. I released a small incremental portion of the larger project to LessWrong (LW), which I forecast as not interesting for an LW audience, so it would be fine if it got little traction and attention. However, putting up the post and actually getting little traction and attention as expected¹⁴ was in fact much more disheartening than expected.

Perhaps one component of the disappointment was that there appeared to be little interest in improving AI risk communications from the community that would be most interested in it, and if there is no interest there, then where? One might say that it’s just because I wrote something beyond salvaging, but the AISafety.info folks in the same time frame were also trying to get feedback and also received minimal feedback ¹⁵. Perhaps there’s something about reading with intent to give feedback that repels even sophisticated readers¹⁶: I myself didn’t give feedback to the AISafety.info folks¹⁷. How could I blame other readers for doing the same to me?

All this said, being disheartened (even unexpectedly) wasn’t necessarily a bad thing: it did get me to re-evaluate my assumptions and decide the project wasn’t worth more time and effort.

Things I should have done differently

In retrospect, I should have done more nothing.

After I quit my job, I was burnt out. I think this goes some way to explaining why I wasn’t strategic. I only took a break for a month after quitting, and then started grinding away at writing. Not much time for mental recovery after a decade at the grindstone¹⁸.

If you’re really interested in my thoughts about how I could have been more strategic, I have shoved those into foldouts:

🔗~83 words Project planning may have helped me keep the bigger picture in mind.

Project planning may have helped discover future road blocks, or highlight that I should come up with ways to test the format. For example, in retrospect I could have tried to release the 1st section of the document stand alone, and probably gotten the same feedback as above, but sooner. It wasn’t obvious I could do this at the start, but if I had built in regular checkpoints (another project planning tool!), I could have noticed sooner than later and tried it out.

🔗~126 words I could have tried to use a more social intensive approach to writing, keeping things focused on usefulness for real readers.

A more socially intensive approach to building the document would have been to create a minimal outline of the major arguments, and then start fleshing it out, but only with arguments and chains of reasoning found “in the wild” from discussions with people from all sides. This would have prevented some of the potential navel gazing problems I mentioned earlier, and may have highlighted problems with the foldout format earlier from intensive feedback. However, I’m not sure I could have pulled this off even without any burn out problems: telling the introvert to talk to everyone all the time sounds like a recipe for disaster. It’s even worse than this, perhaps something like “tell the introvert to get into Twitter arguments with everyone all the time”.

↥ I could have tried to use a more social intensive approach to writing, keeping things focused on usefulness for real readers.

🔗~129 words I could have tried to separate proving out the foldout format from writing about AI risk.

One other thing I could have tried was to separate out testing the foldout format from the AI risk writing. For example, I could have adapted an existing set of arguments into the foldout format, like this climate FAQ, and gotten feedback about whether the format was useful or not. The problem is that climate change is highly politicized, which may have made feedback hard evaluate, and I don’t know of any other similar compilations of arguments that would be easy to convert. On the other hand, doing this may have gotten me to wonder why existing argument mapping software was not already popular.
- I think AI risk is becoming politicized, but even today it doesn’t seem as politicized as climate change. Maybe give it a few years?

↥ I could have tried to separate proving out the foldout format from writing about AI risk.

In summary, the foldout format and ideas like it are compelling traps, which probably don’t work for general communication¹⁹.

What’s next?

Probably going back to work; by this point slinging code should be a nice change of pace. We’ll see!

Thanks for reading!

If not here, where should one read about how AI might kill us all?

Shorter explainers, usually geared towards a popular audience.

FAQ on Catastrophic AI Risks (June 2023), by Yoshua Bengio, 2018 winner of the Turing Award, and sometimes called one of the “Godfathers of AI”.
The ‘Don’t Look Up’ Thinking That Could Doom Us With AI (April 2023), by Max Tegmark, MIT professor. You might know of him because of his multiverse theory.
The basic reasons I expect AGI ruin (April 2023), by Rob Bensinger.
AI Could Defeat All Of Us Combined (June 2022), by Holden Karnofsky. Tries to take superintelligence out of the equation.
The AI Revolution: Our Immortality or Extinction (January 2015), by Tim Urban of popular blog Wait But Why. Mostly included to show that these trains of thought have been around for a while²⁰; OpenAI was founded later that year in December 2015.

Longer explainers with more detail.

AISafety.info has a series of articles that sets out the case for AI risk. The approach it takes is broadly similar to my own work, although there seems to be less detail than I was aiming for (understandable, since they’re subject to the detail dilemma). No single publication date, it seems the authors are maintaining the resource as a living document.
AI 2027 (April 2025) tries to justify an aggressive AI timeline. It is available to read as a 71 page PDF (including all the appendices). Note that the name is somewhat outdated, at least one author after publication thinks timelines aren’t quite as short as 2027. This document sparked much discussion, but there’s no central list of reactions.
The Compendium (December 2024; last update at time of writing, it seems to have been meant as a living document), and is available as a 113 page PDF.
Is Power-Seeking AI an Existential Risk? (June 2022) by Joseph Carlsmith, available as a 57 page PDF (~27k words based on a docs draft). Also see this list of reviews of the document, some of which disagree with the author.
Set Sail For Fail? On AI risk (August 2022) by Nintil, 24k words. It rejects portions of the “orthodox” AI risk paradigm while arguing that a different flavor of AI risk is more plausible.

I agree with the notion that “doom” should be reserved for the specific position that AI is almost certain to cause human extinction and that there is nothing we can do about it, which is an extreme position. I will note that AI doom/doomers is sometimes used as a pejorative for general worry, as a way to paint any amount of worry as unreasonable. AI safety is a somewhat confused term now; it could encompass extinction, but is could also refer to ethical concerns (are LLMs biased?) or brand safety (is this LLM talking about sex? Gross!). AI dontkilleveryoneism is incredibly clear and unlikely to be co-opted, but it sounds dumb. In this post I’ll use the term AI risk to talk specifically about human extinction risk, even though there are many other risks from advancing AI that are not extinction risk.↩︎
It still feels like a misnomer to call SWEs engineers: it’s not like SWEs are required to take ethics classes, or accept liability for their work like professional engineers.↩︎
It is possible that my lack of commitment ensured failure; making something evergreen means less tailoring for the current situation, which is removes the need for effort over time but at the cost of how compelling the post is. One could imagine targeting something between “completely static” and “present concerns only” writing like Zvi’s weekly AI news roundup. Still, I knew from previous experience that I find it very difficult to keep maintaining something if I’m not being paid for it, even if it’s theoretically easy, therefore it would be a terrible idea.↩︎
I used AI Doom to refer to AI Risk in that post, but decided to use AI Risk here. Maybe I really am not cut out to be a writer?↩︎
For example, simple numbered chapters like “Chapter 5”, or broad chapter title like “On Warfare”, or obfuscatory titles like “The Hungry Mongoose Leaps into the Urn” which is nonsensical until you read 80% of the chapter and it turns out to be a reference to a clever quote that perfectly sums up the chapter. People never forget things, so the title is perfect.↩︎
That said, adding additional detail is not completely cost less: adding more details to a deeply nested foldout will add more cognitive load to readers who reach that foldout. However, in pop comp-sci terms it’d be more like O(ln N) instead of O(N) in cognitive load for picky readers.↩︎
I was especially interested in writing a resource that would cover a variety of basic assumptions. For example, a major point of contention around AI risk is whether intelligence is actually important for humans. Instead of only assuming one viewpoint or the other, I wanted to present both cases, and argue that they both lead to AI risk. This sort of subjunctive juggling would normally be a huge mess, but being able to shunt the mess into a foldout would allow the high level argument to stay coherent and cohesive.↩︎
I’m reading between the lines of feedback similar to “can I simply view everything?” That is, the foldouts were annoying enough to request some way to work around them, but not so annoying that readers could pinpoint “X is super annoying” or “X is annoying for Y reason”.↩︎
While working on the explainer I didn’t hear any positive feedback. After killing the project but before I finished writing up this post, I did hear from a beta reader that they liked the format. It’s possible that there’s more silent foldout format enjoyers, but it still seems compatible with my theory that the format only appeals to a small number of people.↩︎
One way this isn’t true is if you’ve heard about a difficult work by word of mouth. In this case the reader might have a terrible experience, but they are willing to put in the work on the word of an outside authority. It is technically possible this could have happened to my work, but it’s nothing to bet on.↩︎
maybe I should have spent the last 2 years trying to induce a “virtuous cycle” similar to Scott Alexander.↩︎
I didn’t quit my job just to write, but it was an upside: I finally had time to do that writing I wanted to do! Or, you could be a bit more cynical and say I needed more time to hype myself up and overcome my natural dislike of writing.↩︎
It is possible to ask phone readers to turn their phones to landscape, except #1 that’s kind of a big ask #2 now you can barely see anything vertically.↩︎
Long time LW readers might object that the post didn’t do all that badly. At time of writing (2025-09-17) the post was at +7 karma and had 6 comments, which is a pretty good amount of engagement, more than many other posts get. As an example, I would estimate that half of the recent posts on LW have no comments at all. However, I was mostly looking for feedback around the foldout format, and if we disregard the comment by my fellow writer of AI risk explanations, the comments were all about the ideas instead of the formatting.↩︎
I did botch the marketing of my post, but the fact that the AISafety.info folks did a better job of their asks and still got minimal traction makes me think that improving my marketing would not have helped.↩︎
Directly requesting feedback from specific people obviously works, and in my experience requesting feedback from a local mailing list will garner feedback as well, but posting to a general social media-esque site doesn’t. It seems obvious that people will gravitate towards discussing new ideas, not towards the grungy work of giving feedback, and I can hardly begrudge them for that.↩︎
One of the AISafety.info authors did drop a comment on my post; unlike me, they understood the value of solidarity.↩︎
On the other hand, taking only a month break may have worked if the project scope was smaller: I did write fairly regularly for 2 years, and produced a lot of text. It just wasn’t enough for the scale of the project.↩︎
This example of a heavily folded/outlined article which is nonetheless popular might be a counterexample; perhaps it points to the format being suitable for references, and not general arguments? Perhaps it just did better karma-wise because it wasn’t asking for feedback?↩︎
2015 is not the oldest we could go by far: searching LessWrong for “friendly AI” turns up a post from 2007, and I’m fairly certain that there was previous terminology that is even older. It would be interesting to write up a history of such terminology, just to have it handy.↩︎