Tax Charity Research

Epistemic status: amateur effort, on the order of half a day of casual investigation. Accidentally failed to do scholarship properly[1]. Possibly accepts bad premises[2].

It’s story time.

So tax season rolls around, and you wonder why taxes are so damn complicated, to the point that paying someone or something to help with your taxes is an attractive proposition. You would like to save the money you would spend on tax prep, but slogging through worksheets of impenetrable bureacratese is bad enough that you’ll pay money for accountants or tax software.

Now look at Sweden, who has return-free filing. The government sends people their prefilled return, the people take a look and sign off if it’s correct, and for most people it takes minutes, instead of the hours it took me to do my taxes[3]. You only need pay for tax help if you want to do weird things.

Now look back at my country: the large tax preparation conglomerates have an incentive to oppose the existential threat to their business model in return-free filing. If they can both keep return-free filing from being implemented, and ensure the tax code is complicated enough that people have to get help, then they can push people to give them money with malicious UX.

With only mild public will to save a few hours a year, and an industry lobby to ensure those few hours stay unsaved, the outcome seems obvious. Companies will lobby for a bigger tax prep market, get money from the bigger market, and then lobby some more, ad infinitum.

It’s a compelling story, but I haven’t done the rigorous legwork of figuring out how true it is. However, for this post will accept it as true[4], so we can focus on what happens next.

In particular, I pay for tax prep software but feel bad about feeding the grand tax prep loop[5]. What can I, a private citizen, do to nudge the nation towards getting an IRS that can implement magical return-free filing?

In this case I’m going to be lazy, not think too hard about ways to take more direct action[6], and simply look for nonprofits already operating in this space.

Summary

I mildly endorse the Institute on Taxation and Economic Policy and the Tax Policy Center. Both are either aligned with my politics or neutral. Unfortunately they are both general tax non-profits, with only tangential focus on return-free filing.

Non-profit structuring

A quick note on nonprofit structures: a 501(c)(3) organization is the usual nonprofit, with donations to the organization being tax deductible. However, 501(c)(3) organizations have the restriction that they can’t attempt to directly influence legislation or campaign for candidates. Instead, explicitly political organizations are covered under 501(c)(4) (social welfare organizations), for which donations are not tax deductible.

So, a somewhat common structure seems to be split between a backend 501(c)(3) organization that does the non-partisan analysis and support work, and if there are outcomes that support a certain policy decision, a sibling 501(c)(4) that pushes those specific policies. This doesn’t seem to prevent the 501(c)(3) organizations from being clearly tilted one way or another: one of the organizations below I donated to stated “we’ll show how proposals to restructure or dismantle progressive income taxes will affect people across the income spectrum”, maintaining just that shade of plausible deniability[7].

However, with the recent tax bill restructuring, it’s a lot harder to hit the standard deduction ($12000), so if you were going to donate to something, maybe it’d make sense to donate to 501(c)(4)s instead. Personally, I’m trying to blow past the standard deduction so I’m primarily interested in the 501(c)(3)s, but I’ll note when there is a 501(c)(4) associated with the below organizations.

Methodology

My methodology was super non-rigorous, politically biased, and not terribly in-depth[8].

  1. I drew from notes accumulated over the past year, which I made whenever I ran across a possible tax-related non-profit.
  2. I searched Google for other sources I may not have stumbled across randomly, and were prominent enough to be found with search terms like “tax nonprofit”.
  3. I searched Charity Navigator for charities related to taxes, skimming the list and cherry picking the ones I thought looked good.

Once I got a list of charities, I tried to answer some questions:

  • Are they a 501(c)(3) or a 501(c)(4)? Remember, donations to one are tax deductible, donations to the other aren’t.
  • What sort of work did they do? As we’ll see, there are some non-profits doing noble work, but it’s work that isn’t addressing the root causes.
  • What do their financials look like? For example, if they’re sitting on lots of cash, it’s less pressing to donate to them.
  • Do their goals align with mine?

Ideally, I would do a quasi-GiveWell-ian impact analysis, convert everything to something like QALY/$ but for something like “policy movement/$”[9], and figure out which charities were doing the best work and had funding shortfalls. However, I neither have the skill nor time to do that, and mumbles something about perfect being the enemy of good.

What I’m donating to

Tax Policy Center (TPC)

  • Both parent institutes are 501(c)(3), so donations are tax deductible.
  • They do mainly do modeling work, as well as produce educational materials. I like the high level suggestions in 10 ways to simplify the tax system because they acknowledge that there are trade offs to these simplifications. They also do bog-standard economic education, like how plastic bag taxes work, but an educational center can’t live on more advanced concepts alone.
  • The TPC is a… joint sub-institute?… of the Urban Institute and Brookings Institute. Both are well off: The Urban Institute has $101M in income (Charity Navigator entry), and the Brookings Institute has $108M in income (Wikipedia, 2016; weirdly, couldn’t find them on Charity Navigator). Unfortunately, it’s not clear how donations/assets are allocated to the TPC specifically.
  • The materials produced by the TPC are not clearly partisan: their reporting on the TCJA was remarkably even handed. They do mention return-free tax filing (12), but it doesn’t appear to be a core pillar of their agenda. This isn’t such a big problem, because no one makes return-free filing a core part of their agenda. Additionally, from Wikipedia both parent institutes are regarded as not especially partisan.

So, the TPC seems to be doing even-handed policy analysis work, with the downside that it is hosted by institutions already well funded relative to other charities.

TPC’s donation page; you may need to specify that you are earmarking a donation for the TPC when donating to either parent institute.

Institute on Taxation and Economic Policy (ITEP)

In short, ITEP is more focused specifically on tax policies, with a progressive bent I generally support, and fewer resources than TPC (especially CTJ[11]).

ITEP’s donation page.

Tax Help

Tax AidCommunity Tax Aid

There’s a class of 501(c)(3) organizations focused on helping local low income folk fill out their tax returns. It’s a noble cause, but it’s not getting at the root of the problem, which is that they need to fill out returns at all.

Wut-level Charities

These are charities that are confusing in some way, or have goals inimical to mine.

Tax Analysts

  • Tax Analysts are a 501(c)(3) organization.
  • They produce tax analysis briefs, as their name implies.
  • For a tax-focused charity, they have a ton of money: $68M in assets, and $48M in income (Charity Navigator).
  • The briefs and positions within seem even handed, so my problem is not with the position of the charity[12]. No, it’s with Tax Notes: it appears that the charity is somehow linked to a subscription service for tax briefs, which is provided to tax professionals and other parties interested enough to pony up thousands of dollars for analyses. For example, “Tax Notes Today” is $2,500 annually. I’m confused about whether Tax Notes is feeding into the Tax Analysts income above, which would make sense given their large asset pool. Working under that assumption, it seems like Tax Analysts don’t need my money.

Americans for Tax Reform (ATR)

This is one of the first results that show up if you search for “tax charity”.

However, the ATR’s main (only?) goal is to lower taxes, period. Then, their Taxpayer Protection Pledge page is full of GOP pull quotes, making it clear who their demographic is, as if the banner “4,000,000 Americans (and counting) will receive Trump Tax Reform Bonuses” (complete with “Click here to see the employers paying bonuses!”) wasn’t clear enough[13].

I mean, I guess the maniacal focus on LOWER TAXES is refreshing in its clarity, but that’s not what I want.

Tax Foundation

The Tax Foundation is a 501(c)(3), and is correspondingly less on the nose about their target demographic than ATR. However, there are some clear indicators which way the Tax Foundation leans: the article “Tax Reform Isn’t Done” talks about making provisions of the recent Tax Cut and Jobs Act permanent[14], and their donation page has a pull quote from Mike Pence.

Tax Council Policy Institute (TCPI)

So the TCPI is a 501(c)(3) (Charity Navigator), but there’s no donation page on their website. What? What charity doesn’t want your money?

Looking at their about page makes it clear that the TCPI is affiliated with The Tax Council, and on their home page is the quote “Our membership is comprised of (but not limited to) Fortune 500 companies, leading accounting and law firms, and major trade associations.” Which makes it clear that they don’t need your dinky public donation, because they have industrial support.

Even if they did accept donations, a part of The Tax Council’s mission is “… contributing to a better understanding of complex and evolving tax laws…” with nary a note about simplifying those tax laws, or at least simplifying how people do their taxes.


[1]  What I should have done is look for other people trying to answer the same question, especially in an EA style. I did not do this, partly because honestly, I didn’t really expect a strong showing, and partly because I had just finished doing my taxes and I didn’t want to keep doing research. I would appreciate it if you let me know about stronger posts/guides on this topic.

[2]  But is it ever impossible for me to not accept bad premises?

[3]  This is a little disingenuous: I expect most people have simpler returns than I do. If I only had one W-2, I would have spent much less time on my taxes.

[4]  That said, I would be shocked if the balance of evidence worked out that return-free filing was negative for American citizens.

[5]  Not using the tax prep industry seems like an obvious first step, except I would be piling a lot of suffering on myself for little gain. I usually check my federal return numbers with Excel 1040, but this year is the first time I couldn’t get my tax return within the same ballpark as the numbers given by the tax prep software. I could have sacrificed a weekend to figure out what was going on, but fuck that.

[6]  This is your periodic reminder that action space is really wide, and doing the lazy thing is sometimes much less effective than doing any direct action.

[7]  “We never said that the effects would be bad“, seems to be the implied response to people charging it with partisan mongering.

[8]  I’ll probably spend more time writing and editing this post than I will have spent on actual research.

[9]  Yes, QALYs are weird, and the GiveWell approach is vulnerable to the streetlight effect. Understood.

[10]  The article also references Elizabeth Warren’s return-free bill, which does raise a question about why I don’t donate directly Elizabeth Warren. I remember her advocating for weird policies, but apparently my go-to dumb policy I thought she backed was her anti-vax position, which was either blown out of proportion or reversed at some point. So, basically no reason.

[11]  Unfortunately they don’t seem to have their IT locked down tight, since I found a almost certainly surreptitious CoinHive install on their site.

[12]  Even if the articles can be inanely focused on the minutia of policy: when I was doing my research, the Tax Analysts featured articles list was full of articles about the grain glitch, a tax loophole.

[13]  Plug for Sarah’s post about the intertwining of politics and aesthetics, Naming the Nameless, which partially explains why the ATR uses language usually reserved for last generation click bait and aggressive ads.

[14]  I haven’t really been following along with the TCJA, and don’t have a strong opinion on the specific policy changes, so it’s more of a gut-level identity-based dislike of the support of the TCJA. Yes, yes, this is why we can’t have nice things.

Review/Rant: The Southern Reach Trilogy

Warnings: contains spoilers for AnnihilationAuthorityAcceptanceThe Quantum ThiefThe ExpanseThe LaundryverseDark Matter, and SCP (as much as SCP could be said to have spoilers). Discussion of horror works. Otherwise contains your regularly scheduled science fiction rant.

I recently[1] blew through Jeff VanderMeer’s Annihilation/Authority/Acceptance series, also known as the Southern Reach Trilogy, which I’ll abbreviate to SRT.

First things first: overall, it was pretty good. I enjoyed the writing, the clever turns of phrase (“Sheepish smile, offered up to a raging wolf of a narcissist.”[2]). It’s reasonably good at keeping up the tension, even while sitting around in bland offices with the characters politicking at each other.

So the writing is alright, but the real draw was kind of the setting, kind of the story structure, kind of the subject matter. In a way, it’s right up my alley. It’s just a… weird alley.


The most obvious weird is used as a driving force in the world building, forcing us reconsider what exactly we’re reading.

Is this an environmental thriller? Kind of, but the environmental message is muted and bland, restricted to a repeated offhand remark “well, too bad the environment is fucked”. Is this an X-Files rip off? Kind of, but the paranormal is undeniable: you don’t want to believe it’s there, you want to believe there’s an explanation behind it all. Is this a romance? For the first book maybe, but with one of the pair entirely absent from the book[3]. The second book doesn’t help by introducing elements of the corporate thriller genre, and then axing any chance of finishing that transition by the end of the book.

Whatever it is, all throughout SRT is world building, but shot through with twists and turns. It reminds me of those creepy dolly zooms (examples) which undermine the sense of perception, but applied to narrative. For example, the biologist and story at large constantly give up information that forces us to reconsider everything that came before:

  • By the way, my husband was part of the previous expedition.
  • By the way, there were way more expeditions than 12.
  • By the way, the danger lights don’t actually do anything[4].
  • By the way, I (the biologist) am glowing.
  • By the way, the 12th expedition psychologist was the director of Southern Reach.
  • By the way, said director was in the lighthouse picture.
  • By the way, Central was involved in the Science and Seance Brigade.
  • Did I mention Control’s mom was in the thick of it?

It’s sort of like Jeff is giving us an unreliable narrator with training wheels: we’re not left at any point with contradictory information, yet there’s a strong sense that our only line into the story is controlled by a grinning spin doctor. It’s an artful set of lies by omission.

My suspicion is that I enjoyed this particular aspect of the SRT for the same reason I enjoyed The Quantum Thief trilogy. Hannu does a bit less hand holding[5], like starting the series with the infamous cold open “As always, before the warmind and I shoot each other, I try to make small talk. ‘Prisons are always the same, don’t you think?'”[6]. And as an example, the trilogy never explicitly lays out who the hell Fedorov is: in fact, I didn’t even expect him to be a real person, but his ideology (or the Sobornost’s understanding of his Great Common Task) was so constrained by the plot happening around it that I never had to leave the story and, say, search Wikipedia, which was excellent story crafting. Anathem is another book that does this sort of “fuck it we’ll do it live” sketching of a world to great effect[7].

But while The Quantum Thief is sprinting through cryptographic hierarchies and Sobornost copy clans, it’s still grounded in a human story. The master thief/warrior/detective tropes serve as a reassuring life vest while Hannu tries to drown us with future shock[8][9]. The SRT doesn’t need as much of a touch point, since we never leave Earth and bum around a mostly normal forest and a mostly normal office building[10], but the organizational breakdown in the expedition and Southern Reach agency are eminently relatable in the face of a much larger and stranger unfolding universe.


Let’s unpack that unfolding universe.

The world of SRT is weird: while The Quantum Thief is a fire hose, it only spews the literary equivalent of water, easily digestible and only murky in tremendous quantities. The SRT finishes with loose ends, the author at some point shrugging his shoulders and leaving a dripping plot point open for the spectacle of it, and that’s okay. It’s weird fiction.

Another parallel: Solaris describes a truly alien world sized organism. What is it thinking? How does it think? How do you communicate with it? The story ends with all questions about the planet Solaris unresolved, with the humans only finding out that broadcasting EEG waves into the planet does something[11]. No men in rubber suits here, just an ineffable consciousness. Even a hungry planet makes more sense to us: at least it has visible goals that we can model (even if they are horrifying[12]).

You end up with the same state in SRT: what is Area X doing? Why is it doing it? What the hell does the Markov chain sermon mean?[13]

I’m guessing this is why people don’t like it: there are barely any answers at the end. How did turning into an animal and leaping through a doorway help at all? Did Central ever get their shit together? What’s up with the burning portal world? If you were expecting a knowable “rockets and chemicals” world, it’d be disorienting.

In a way the story suffers a bit from a mystery box problem, where there are boxes that are never opened. However, in this case I think the unopened boxes are unimportant. Sure, the future of humanity is left uncertain, the mechanisms of Area X are still mysterious, but we know what happened to all the main characters, see how they played their parts and have some closure.

(I am miffed that Joss Whedon is poisoning the proverbial storytelling well. Yes, mystery boxing makes economic sense, but now I see the mystery box like I hear the Wilheim scream, and it’s not pretty.)


Okay, so we have a weird new world we explore, and weird fiction that is weird for the sake of being weird, but I’m neglecting the weird that gives people bad dreams.

On one level there’s simple horror based on things going bump in the night: think about the moaning psychologist in the reeds, the slug/crawler able to kill those that interrupt its raving sermon. But that doesn’t show up in spades: the description of the 1st expedition disintegration cuts off after a sneak peak, omitting most of the ugly details. Jeff had plenty of opportunity to get into shock horror, and didn’t.

I think that he wanted to instead emphasize the 2nd layer of Lovecraftian horror beyond the grasping tentacles, a horror driven by a tremendous and possibly/maybe/almost certainly malign world[14]. Area X pulls off simple impossible feats like time dilation and a barrier that transports things elsewhere (or nowhere). More concerning is the fact that Area X knows what humans look like. It’s an alien artifact, and somehow (something like the Integrated Information Theory of consciousness turns out to be right?) knows what makes up a human, recognizes them as special and in need of twisting, and can’t help but twist with powers beyond our understanding. There’s something large and unspeakably powerful stalking humanity, and it is hungry.

Or maybe it’s not deliberately stalking humanity, and it’s just engaging sub-conscious level reactions, and everything it has done so far is the equivalent of rolling over in its sleep: how would Area X know it just rolled over a butterfly of an expedition? This implies a second question: what happens when it finally wakes up?

It all reminds me of The Expanse series. Sure, there’s the radically simplified political/economic/military squabbling and made for action movie plot, but the protomolecule is what I’m thinking about. “It reaches out it reaches out it reaches out“: an entire asteroid of humans melted down for spare parts by the protomolecule are kept in abeyance for use, living and being killed again and again in simulation until the brute force search finds something useful happening (which in turns reminds me of the chilling line “There is life eternal in the Eater of Souls”.) Thousands die and live and die, all to check a cosmic answering machine.

If we want to draw an analogy, the first level of horror draws from being powerless in the face of malign danger: think of the axe murderer chasing the cheerleader. The second level of horror draws from the entirety of humanity being powerless in the face of vast malign danger. Samuel L. Jackson can handle an axe murderer, but up against the AM from “I Have No Mouth and I Must Scream”? No contest[15].

(We could even go further, and think about the third level as malign forces of nature: Samuel L. Jackson vs the concept of existential despair might be an example, not on the level of “overcoming your inner demons” but “eradicating the concept as a plausible state of mind for humans to be in”[16]. Now that I think about it, it would have been an interesting direction to take The Quantum Thief’s All-Defector, fleshing it out as a distillation of a game theoretic concept like Moloch. Maybe there’s room for a story about recarving the world without certain malign mathematical patterns… well, maybe without religious overtones either.)


But we’ve only been looking at what the rock of Area X has been doing to the humans. What about the hard place of the Southern Reach agency, and what they do to humans? The agency continually sends expeditions into a hostile world, getting little in return, and pulls stunts like herding rabbits into the boundary without rhyme or reason. In the face of failure to analyze, they can only keep sending people in, hoping that an answer to Area X will pop back out if they just figure out the right hyperparameter of “which people do we send?”.

In other words: a questionably moral quasi-government agency, operating from the shadows to investigate and prepare to combat a unknown force that might destroy all of humanity? And as if it wasn’t close enough, the SRT throws in the line “What if containment is a joke?”, and I almost laughed out loud. It’s all a dead ringer for the Foundation in the SCP universe.

A little background: SCP is one of those only-possible-with-the-internet media works[17], a collaborative wiki[18] detailing the workings of the Foundation, an extra-governmental agency with an international mandate to, well, secure, contain, and protect against a whole bevy of anomalous artifacts and entities. SCP. As is with wikis there is an enormous range of work: some case files detail tame artifacts (a living drawing), or problems solvable with non-nuclear heavy weapons (basically a big termite), or with nukes (a… living fatberg?), or something a 5-year old might come up with if you asked them to imagine the most scary possible thing (an invincible lizard! With acid blood!).

And then there’s things a bit more disquieting. Light that converts biological matter to… something elseInfectious ideasAn object that can’t be described as it is, just as it is not (it’s definitely not safe).[19]

Area X slots into this menagerie well, an upper tier threat to humanity. It’s utterly alien and unpredictable, actively wielding unknown amounts of power to unknown ends. With the end of SRT, it seems likely that an “XK Class End of the World scenario” is in progress, a real proper apocalypse pulling the curtains on humanity.

On the other hand, the Southern Reach/Central agencies are vastly less competent at handling existential threats than the Foundation (this, despite a mastery of hypnosis the Foundation would kill for[20]). Part of it is the nonsensical strategy: for crying out loud, Central sends a mental weapon in to try and provoke Area X, and to what end? To hasten the end of the world? Then Lowry gaining control of the Area X project was absolutely atrocious organizational hygiene, a willful lack of consideration that contamination can go past biological bacteria and viruses, that the molecular assembly artifact under study can change your merely physical mind. An O5 Foundation overseer would have seen dormant memetic agents activate and rip through departments, and would take note of a field agent turned desk jockey that started accumulating more and more soft power in the branch investigating the same anomaly that nearly took his life…

Back to the first hand, both works partly derive their horror from the collision of staid and sterile office politics with the viscerally supernatural. Drawing from the savanna approximation, we weren’t built to work in cubicles, and there were definitely no trolleys, much less trolley problems[21]. And office organizations are unnatural, but are the most effective way we’ve found to get a great many things done. So press the WEIRD but effective organizational tool into service to call the shots on constant high-velocity high-stakes moral problems, except it’s not people on the tracks but megadeaths, and you start to get at why it’s so unnerving to read interdepartmental memos about how to combat today’s supernatural horror[22].

And there’s the “sending people to their death” aspect of both organizations, which conflicts with their nominally scientific venture: at least no one pretends the military hierarchy is trying to discover some deeper truth when it sends people into battle. So the faceless bureaucracy expends[23] their people[24] to chart the ragged edges of reality[25], and gets dubious returns back. The Southern Reach gets a lighthouse full of unread journals, the Foundation usually just figures out yet another thing won’t destroy an artifact of interest.

And as an honorable mention, the Laundryverse by Charlie Stross shares strong similarity to both works: Lovecraftian horrors are invokable with complicated math, the planets are slowly aligning, and world governments have created agencies to prepare for this eventuality, deal with “smaller” “supernatural” incidents, and find/house the nerds that accidentally discover “cosmic horror math”. This series focuses a bit more on the humorous side of office hijinks, and focuses on threats a bit more tractable to the human mind: at least many of the threats Bob faces can be hurt with the Medusa camera he carries around.

If you want a taster into the Laundryverse, you could do worse than the freely available Tor stories (Down on the FarmOvertime[26], Equoid (gross!)[27], or the not-really-Laundryverse-but-pretty-damn-similar A Colder War[28], in which I remember Stross being inordinately pleased to include the line “so you’re saying we’ve got a, a Shoggoth gap?”.


In the end, I wasn’t too entirely horrified: the best SCP has to offer rustled my jimmies more than Area X. And, the Laundryverse is somewhat more entertaining than the SRT. And Solaris does the “utterly alien”-alien a bit better. SRT, though, strikes a balance between all these concerns, and has much better writing quality than SCP, and fewer of the hangups that turned me off The Expanse[29].

But let me rant for a bit.

On Goodreads Annihilation has an average 3.6 score. I personally don’t think it deserves such a low score, but a fair number of people were turned off by the characters, it’s not everyone’s cup of tea, okay sure fine.

Dark Matter, a nominally science fiction novel, has a 4.1. 4.1! I only see acclaimed classics and amazing crowd favorites with those sorts of scores.

The problem is that Dark Matter is FUCKING TERRIBLE. I know, I complained about this before (on my newsletter), and I’ll complain again, because it’s a fucking travesty that Annihilation got relegated to bargain bin scores compared with an utterly predictable story with trash science and characterization so bland doctors prescribe it when you are shitting your brains out due to a norovirus infection[30].

Maybe I can say it another way:

Where lies the darkness that came from the hand of the writer I shall bring forth a fruit rotten with the tunnels of the worms that shine with the warmth of the flame of knowledge which consumes the hollow forms of a passing age and splits the fruit with a writhing of a monstrous absence which howl with worlds which never were and never will be. The forms will hack at the roots of the world and fell the tree of time which reveals the revelation of the fatal softness in the writer. All shall come to decide in the time of the revelation, and shall choose death[31] while the hand of the writer shall rejoice, for there is no sin in writing an action plot that the New York Times Bestseller list cannot forgive[32].

Again, a fucking travesty. Christ.


[1]  Not so recently by the time this post is published. I’m still a slow writer.

[2]  Okay, it’s a little too clever for it’s own good.

[3]  Surely there is Control/Grace rule 34. Or anyone/thousand-eye mutated Biologist. But as far as I know Biologist-husband is the only canon pairing.

[4]  I almost forgot these were a thing while reading Annihilation, so a quick refresher: “… a small rectangle of black metal with a glass-covered hole in the middle. If the hole glowed red, we had thirty minutes to remove ourselves to ‘a safe place.'”.

[5]  If you want a flavor of the info dump sort of style of The Quantum Thief, I recommend “Variations on an Apple” as an even more extreme example: I suspect that normal people feel the same way reading The Quantum Thief as when I first read that story.

[6]  Except where SRT slowly reveals the unnaturalness of the world, The Quantum Thief revels in it, fills the tub with weird and takes a luxurious bath. Like, it seems like Hannu tried really hard to get the “Toto, I don’t think we’re on Earth anymore” senses tingling right in the first sentence.

[7]  Well, if you’re willing to put up with/enjoy the made up words.

[8]  I mean, I do wonder if the author was too bad of a writer to pull off something less stereotypical while retaining the alien world, but maybe it was intentional. Sure, the writer has written some cringeworthy stuff (I never knew someone could string together the word “kawaii” so poorly), but that’s what the internet has given us, government officials with a publicly available teenager history.

[9]  Charlie Stross has more thoughts about drowning people with future shock as a genre, namely that it isn’t productive any longer because we’re already in a (future?) shocking world.

[10]  Breathing cafeteria wall notwithstanding.

[11]  Because EEG is somehow magical? Well, Solaris was written in the 1960s, so some amount of leeway is necessary. But even if you replace the EEG with some other brain state, you have to wonder what exactly Solaris would be doing with it… “Data can’t defend itself” and all that.

[12]  Another alternative is the cactus that doesn’t lift a finger to attain stated goals.

[13]  It turns out to be surprisingly understandable once you finish the trilogy, even if it reads like a digested the Old Testament.

[14]  Yeah, we’re ignoring the icky parts of Lovecraft.

[15]  I’m ignoring the fact that any movie plot would somehow have Samuel L. Motherfuckin’ Jackson end up the winner: it’s too bad that our widely known “tough guy” archetypes are all actors, which then implies the presence of Hollywood plot armor.

[16]  General memetic hazards might be another example: Roko’s Basilisk is a shitty example of one.

[17]  Other examples I know of are Football in the Year 17776 (previously), Deep Rising (a little less so, it’s just a comic+music), Homestuck (a little less so, it’s just a walls of text+animations), and every piece of interactive fiction: for example, Take (and spoiler-ific analysis).

[18]  It seems almost like a fandom that didn’t coalesce around an existing body of work/author, one that just birthed into the void without a clear seeding work.

[19]  This isn’t the best that SCP has to offer. It’s just that there’s so damn much of it, and it’s not like I’m keeping records on which pages are the best.

[20]  A good life heuristic: if the Foundation would kill to get some capability, maybe you should rethink trying to get that capability.

[21]  But maybe we don’t want to be good at solving trolley problems?

[22]  The dispassionate Foundation reports are effective at conveying the sense of wrongness. There’s a brutal rhythm to the uniform format, leaving a feeling that in order to fight the monsters out there we had to suppress our humanity until we became monstrous in our own way.

[23]  Interesting yet morbid comment: “Well, you were properly expended, Gus. It was part of the price.”.

[24]  New head canon (if such a thing could be considered to exist in the SCP-verse): the replication crisis was suppressed by the Foundation to maintain the facade of the Milgram obedience experiment, which is useful for subconsciously convincing D-class they will eventually follow orders.

[25]  Line stolen from qntm‘s Ra (chapter link).

[26]  The frame story is a bit eye roll inducing, but I understand a man’s gotta publish.

[27]  No, really, it’s gross. Stross: “Stross explains his idea about the life cycle of unicorns to Scalzi and Anders. When he stops retching, Scalzi’s body language changes until it eerily matches Anders. ‘Don’t call us, we’ll call you,’ he says with icy-sober politeness, and beats a hasty retreat.”.

[28]  Home to my go-to chilling quotes “There is life eternal in the Eater of Souls” (previously referenced) and “Why is hell so cold this time of year?”.

[29]  Namely, the incredibly simplified politics and anti-corporation messages set up puppet villains that aren’t interesting: I’d be more into it if the trade offs were more nuanced. It’s still a good “Holden and friends fly around and have adventures” series, though.

[30]  The BRAT diet is bland for a reason: ask me how I know this!

[31]  No, not being emo here: the clones of the main character of Dark Matter (don’t make me look this up, please) end up choosing to fight each other because they can’t figure out functional decision theory. This would be fine, if the main character weren’t ostensibly eminent physics professor material.

[32]  Everything is based on some correspondence with what I actually mean, which fits with what Jeff VanderMeer also did with the original “strangling fruit” prose.

Making the Most of Bitcoin

Epistemic status: I believe I’m drawing on common wisdom up to part 5. After that I’m just making shit up, but in a possibly interesting way. Not proper financial advice, see the end of the post.

So let’s say you have some Bitcoin. What do you do with it?

#1. Cash out everything immediately

Lots of people think putting your money in Bitcoin is a bad idea: Jack Bogle (founder of Vanguard)Warren BuffetRobert Shiller (Yale economic professor)Mr. Money MustacheJason Calacanis (Angel investor)[1]. I tend to agree with them[2], and am basically following this action by not buying in[3].

However, you (hypothetical Bitcoin holder) already knew that Bitcoin was widely thought to be not the greatest investment vehicle, and bought in anyways. You’re not going to immediately cash out, ok, fine, whatever. What else could you do?

#2. Become a HODLR

You’re going to HODL the Bitcoin you have until it reaches THE MOON. It’s unclear what you’ll do once it reaches THE MOON. Maybe you’ll just slowly squander your satoshis on breeding Shiba Inus and kidnapping cryptography experts to ensure the sanctity of SHA-256.

Or maybe one day you’ll end up with 99% of your net worth in Bitcoin, and the next day you’ll have 0% of your net worth in Bitcoin because your kidnapping orders were read incorrectly, and SHA-256 was demonstrably broken by vendetta-driven cryptographers overnight[4]. Also, the Iranians are really mad at you[5].

Another way of looking at it is that it’s difficult to make money slowly with Bitcoin: there are no fundamentals[6] to inexorably drive value, you can’t yell “gains through trade, buy ’em all and let the market sort em’ out!” and put your money in an index fund equivalent and then forget about it.

The life of a HODLR is a life with a hell of a lot of volatility; maybe there’s a better strategy?

#3. Time the market

The key is to buy low, sell high. This advice is approximately as useful as “be attractive, don’t be unattractive” labeled as dating advice.

If you think you can beat the market, I’ll point you to all the rest of the brilliant ideas that have been tried and failed, and the anti-inductive nature of the market, and the seeming adequacy of liquid markets. If you still think you have a grand insight into market mechanics, the great thing is that you can go make a billion dollars if you’re right. Go on, and try to remember us little people.

Besides, if I knew how to do this, would I be here telling you? I would be out playing with my Shiba herd instead.

#4. Recoup your investment

This strategy has the virtue of simplicity:

  • Buy some Bitcoin.
  • Wait until the price of Bitcoin doubles.
  • Sell half your Bitcoin, making back your original “investment”. Now it’s not possible to be worse off than before.
  • … HODL?

It’s nice to not lose money (as long as the market doesn’t crash out before you reach your doubled price), but you have one point at which you cash out, and then you’re back to not having any strategy.

#5. Rebalance

Another strategy is to simply rebalance.

A quick tutorial detour: let’s say there are only 2 investments in the world, boonds and stoocks[7]. Boonds are low risk, low reward, and stoocks are large risk, large reward.

Let’s say you’re a young’un that has just entered the job market with $1000 to put into the market, and have an appetite for risk in order to get good returns. That means taking on higher risk, but that’s okay since you’ll have plenty of years to rebuild if things go south. So you might go for a 90% stoocks, 10% boonds allocation, for $900 stoocks/$100 boonds.

Now let’s say that the market absolutely tanks tomorrow. Boonds don’t really change since they’re low risk; let’s say boonds take a 10% hit. But stoocks, man, they took a 95% hit. Now we’ve ended up $45 stoocks/$90 boonds, meaning our asset allocation is 33.3% stoocks/66.6% boonds. #1. This is super sad, we’ve lost a lot of money, but #2. This isn’t what we want at all! We have so many boonds that our risk of losing most of what we have is low, but our returns are also going to be super low. Besides, even if we do lose it all, we’ll make it back in salary over a few days.

So what we can do is rebalance: we sell our abundance of boonds, and buy more stoocks, until we have a 90% stoock/10% boond allocation again, which works out to $121.5 stoocks/$13.5 boonds[8].

To fill out the rebalancing example, now let’s say you’re older and about to retire. Over the years you’ve shifted your asset allocation to 10% stoocks/90% boonds with $100000 stoocks/$900000 boonds: this close to retirement, you’d be in a lot of trouble if most of your money disappeared overnight, so you want low risk.

Now let’s say stoocks do fantastically well tomorrow, growing 10000%, so you end up with $10000000 stoocks/$900000 boonds. The problem is that now your allocation by percentage is 91.7% stoocks/8.3% boonds, and you’re about to enter retirement. All your wealth is in a super-risky investment! Could your heart even handle the bottom of the market dropping out? Instead of letting that happen, you could rebalance back to 10% stoocks/90% boonds or $1090000 stoocks/$9810000 boonds[9].

What’s the moral of the story? If you have multiple asset risk classes, then you don’t have to put it all on black and ride the bubbles up and down like a cowboy: rebalancing is a simple strategy to target some amount of risk, and then you can just go long and not worry about the fine details.

There are finer details that do matter: you can’t rebalance Bitcoin often or you might get eaten alive by mining fees[10] (which peaked at an average of $50 when Bitcoin was around $10000). So maybe you’d target some large-ish percentage change and only rebalance once Bitcoin changes by that amount.

Let’s run some numbers: let’s say 1 Bitcoin is currently $1000,
and you have exactly 1 bitcoin, and you rebalance only whenever Bitcoin doubles in price (this basically extends the previous “double and sell” strategy). Now if Bitcoin goes from $1000 to $10000, you would rebalance 3 times: when Bitcoin is $2000, $4000, and $8000. If you have many more assets than $1000, you can hand wave away the exact percentage calculations and just sell half the Bitcoin at each point. Even if Bitcoin crashes to $0.001 after reaching $10000, you’ve “made” $3000 that you’ve rebalanced to other stabler assets (minus fees, $~70). Not bad for riding a speculative bubble!

#6. Kind of rebalance-ish

On the other hand, only getting $3000 out of a maximum of $10000 Bitcoin seems… not a good show. Sure, you were going to get only $0.001 if you were a HODLR, but that $10000 is a juicy number, and $3000 is an awful lot smaller.

Or consider the scenario in which you read Gwern in 2011 speculating
that Bitcoin could reach $10000
, and you were convinced that you should be long on Bitcoin. However, it was still possible that Bitcoin wouldn’t reach $10000, falling prey to some unforeseen problem before then. You would want to hedge, but rebalancing would throw away most of your gains before you got close to $10000. For example, if you started with $1000 @$1/BTC for a total of 1000 BTC, and you rebalanced at every doubling, you would end up with $13000 cash and ~$1000 BTC, compared to HODLing ending up with $10000000 in BTC. It’s a used car versus being the Pineapple Fund guy, I get it, it’s why HODLing is enticing.

The problem is that rebalancing doesn’t know anything about beliefs about long term outcomes, just about overall asset class volatility.

That said, if it’s possible to encode your beliefs as a probability distribution[11], you could run (appropriately named) Monte Carlo simulations of different selling strategies and see how they do, choosing a strategy that does well given what you expect the price of BTC to do.

I’ll work some simple examples, following some assumptions:

  • we start from a current price of $10000/BTC.
  • we don’t care about the day-to-day price: if BTC reaches $20000, dips back to $15000, and then rises to $50000, we aren’t concerning ourselves with trying to time the dip, just with the notion that BTC went from $20000 to $50000.
  • rebalancing is replaced with a hedge operation, where some fixed fraction of our Bitcoin stake is sold, at some fixed rising proportion of BTC. We’ll fix our sell point at every doubling (except for a sensitivity analysis step below).
  • the transfer fees are set to be proportional to the price of BTC, at 0.5%: in practice, this just serves as a drag on the BTC-cash conversion. If you’re dealing with amounts much larger than 1 BTC (or SegWit works out), you might be able to amortize the transfer costs down to 0. To allow interpolating between both cases, we’ll simply give both 0.5% and 0% transaction drag simulations.
  • the price of Bitcoin is modeled as rising to some maximum amount, and then crashing to basically nothing. This can also cover cases where BTC crashes and stays low for such a long time that it would have been better to put your assets elsewhere.

The processes of adapting the general principle to real life, consulting the economic/finance literature for vastly superior modeling methods, using more sophisticated selling strategies than selling a constant fraction, and not betting your shirt on black is left as an exercise to the reader.

So let’s say our beliefs are described by a mangled normal distribution[12]: we’re certain BTC will reach the starting price (obviously, we’re already there), around 68% less certain BTC will reach 1 standard deviation above the starting price, 95% less certain BTC will reach the 2nd standard deviation, so on and so forth. We’re not interested in a max BTC price below our starting price, so we’re just chopping the distribution in half and doubling the positive side.

Since we’ve centered the normal distribution on our starting price, we have only one other parameter to choose, our standard deviation (stdev). Some values are obviously bad: choosing a stdev of $1 means you are astronomically confident that BTC will never go above $10100. While you might not believe in the fundamentals behind Bitcoin, it is odd to be so confident that the crash is going to happen in such a specific range of prices. On the other hand, I don’t have a formal inference engine from which I can get a stdev value that best fits my beliefs, so I’ll be generous and choose a middling value of $10000.

So if we run a number of simulations where the price of BTC follows the described normal distribution, we get:

Price simulation with a normal distribution

Several things become apparent right away:

  • there’s an obvious stepping effect happening[13]. Thinking about it, it’s obvious that each separate line is describing the effects of selling at each doubling. The lowest line only manages to sell once, the next line sells twice, and so on.
  • as one might expect, selling everything is low variance, and holding more is higher variance. As a reference point, the 0.5 sell fraction is just the previously described rebalancing strategy.
  • even when hitting 4 sell points, the transaction drag on 1 BTC isn’t too bad.
  • fitting a trend line with LOESS gets us a rough[14] measure of expected profit. In particular, we seem to top out at $20000 around a 0.5 sell fraction.

An obvious sensitivity analysis comes to mind: does the fact we’re selling only at every doubling matter? What if we sold more often? We can re-run the analysis when we sell at every 1.2x:

Price simulation with a normal distribution, selling at every 1.2x increase

The stepping effect is still there, but less obvious: we hit more steps on the way to the crash price. The largest data points don’t go as high, but you can also see fewer zero values, since we pick up some selling points between $10000 and $20000. Additionally, the LOESS peaks at a lower sell fraction, which makes some sense: since we’re hitting more sell points, we can afford to hold on to more.

What if the normal distribution doesn’t describe our beliefs? Say we want more emphasis on the long term. Then our beliefs might be better modeled with the exponential distribution which is known to have a thicker tail than the normal.

If we use $10000 for the exponential distribution’s lambda parameter, then our simulations look like:

Price simulation with an exponential distribution

The behavior isn’t too different, with the exception that some simulations start surviving to the 5th sell point. Additionally, the LOESS curves move to the left a bit compared to the normal, but only by a little: from eyeballing it, the peak might move from a sell fraction of 0.55 to 0.45.

Again, there are more sophisticated analyses; for example, maybe you think that your probability distribution peaks around $100k/BTC and falls off to either side, in which case you would want a more complicated strategy to take advantage of your more complicated beliefs.

However, there’s a theoretical problem with our analyses thus far. The distributions we’ve been using are unbounded, allowing BTC prices that can theoretically go to infinity. Sure, we can treat economics as effectively unbounded: there sure are a lot of stars out there, and no economic activity has even left Earth orbit (Starmansome bacteria, and drawings of naked people notwithstanding). But that’s in the long run[15], and we only really care about BTC in the short term, when it’s generating “returns” in excess of normal market returns. For example, if BTC is wildly successful and becomes the world currency, it becomes hard to see how BTC can continue to grow in value far beyond the economic growth of the rest of the world[16]. So we might assume that once BTC eats the world, BTC just follows the bog standard economic growth of the world, and ceases to be interesting relative to all other assets[17].

However, this does mean we can add two assumptions: our distributions should be bounded, and there’s a chance the value of our held BTC doesn’t all disappear in the end. I’ll bound our distributions at the current stock market cap (as of 2018/03/06 $80 trillion, rounded to $100 trillion for ease of math)[18], and use a 2nd function (not a probability distribution!) to encode the probability that if BTC reaches a certain price, it will crash.

For the probability of reaching a price, I’ll keep using the exponential distribution, but bounded and re-normalized to add up to 1 within the bounds[19]. For the probability that BTC will crash, we don’t need a distribution: we could imagine a function that always returns 100% for a crash (as we were before), or 0, or any value between. Mathematically importantly, we’re not beholden to normalization concerns. I essentially free handed this function piecemeal with polynomials, with the goal of reflecting a belief that either BTC stabilizes as a small player in the financial markets, or becomes the world currency and not likely to lose value suddenly. Plotted on log axes:

Price distribution and probability BTC doesn't crash

When we run simulations (displayed on a log y-axis):

Price simulation with a bounded exponential distribution, on a log scale

Up to now transaction drag hasn’t been such a big deal, but it suddenly shows up as a big deal: if we end up in a world where the price of BTC goes long and retains value, 0.5% drag appears to suddenly be super important, preventing us from getting close to the maximum $10000000 from our initial 1 BTC. It’s not too surprising, since more mundane investments need to also deal with fee[20] and tax drag.

But if these beliefs are correct, do we do better on average? Not really, especially with transaction drag factored in. This holds true even when we zoom in on a linear axis[21][22]:

Price simulation with a bounded exponential distribution, on a linear scale

I’ll end here. You could always make your models more complicated, but I’m making precisely $0 off this and that XCOM 2 isn’t going to play itself.


So after all this analysis, what do I recommend you do?

Trick question! I don’t recommend you do anything, because this post is not financial advice. If you persist in trying to take financial advice from someone who may frankly be a corgi, the world will laugh at you when BTC crashes to the floor and Dogecoin rises to take it’s place as the true master of cryptocurrencies. ALL HAIL THE SHIBA, WOOF WOOF.


R code used to generate the graphs available on github.


[1]  “But all those people are famous and invested in the status quo!” Okay, you got me, will linking to a non-super-rich acquaintance’s opinion on Bitcoin help?

To be even fairer, I could also come up with a similar list supporting Bitcoin instead, but I’m less interested in debating the merits of Bitcoin, and more interested in what you do once you wake up with a hangover and a wallet full of satoshis.

[2]  I disagree with Scott when he says that we should have won bigger with Bitcoin. Most of the gnashing of teeth over Bitcoin is pure hindsight bias.

[3]  Currently the only reason I would get any cryptocurrency is to use it as a distributed timestamping service.

[4]  It’s not just breaking the base crypto layer: the nations of the world could decide to get real and criminalize Bitcoin. Law enforcement could get better at deanonymizing transactions, causing all the criminals to leave for something like Monero. Price stabilization just never happens, and people get sick of waiting for it to happen. Transaction fees spike whenever people actually try to use Bitcoin as a currency, or the Lightning Network turns out to have deep technical problems after a mighty effort to put it into place (deep problems in a widely deployed technology? That could never happen!). Ethereum gets its shit together and eats Bitcoin’s lunch with digital kittens. There’s the first mtgox-level hack since BTC started trading on actual exchanges. People decide they want to cash out of the tulip market en masse (although that might be unfair to the tulips).

[5]  It’s unclear where you would get a Shah today, but exhuming all past Shahs is probably enough to piss people off.

[6]  No, evading taxes/police actions is not a fundamental.

[7]  Names munged to emphasize that they’re fantasy financial instruments.

[8]  There’s something to be said about keeping a stable and liquid store like a savings account to make sure living expenses are covered for 6 months. You can replace the implied “all assets” with “all available assets” for a more non-toy policy.

[9]  If the market simply dropped back to its previous position before you could rebalance, then you aren’t any worse off than you were 2 days ago, so maybe it wouldn’t be so disappointing to miss this opportunity. But that’s just anchoring, and Homo Economicus in your position would be super bummed.

[10]  Normal investments have similar tax implications where you realize gains/losses at sale, covered by the general term tax drag.

[11]  More on probabilities as states of beliefs, instead of simply reflecting experimental frequencies.

[12]  Coming up with a better distribution is left as an exercise for the reader.

[13]  A mild amount of jittering was added to make this visible with more simulation points.

[14]  LOESS fits with squared loss, which emphasizes outliers, which you might not want. Additionally, LOESS is an ad hoc computational method (much like k-means) which won’t necessarily maximize anything; the main advantage is that it looks pretty if you choose the right spans to average over, and you don’t have to come up with a parametric model to fit to.

[15]  And as they say, in the long run we’re all deadYes, we’re working on that.

[16]  Sure, the bubble could continue, but bubbles pop at some point, and if it’s so damn important to the economy war isn’t out of the question, and if large scale nuclear war happens, more than just the price of Bitcoin is going to crash. “Here lies humanity: they committed suicide by hard math.”

Or a different perspective. Who would win?

  • Billions of people that didn’t buy into Bitcoin, all frozen out of the brave new economy, backed by all the military might of nations that care about the sovereignty of their money supply.
  • One chainy boi.

[17]  There’s reasons to believe BTC might act otherwise:

  • The fact that Bitcoin is deflationary, so it probably won’t act like a normal commodity in the limit if it eats the world. Even companies can issue more stock, or more gold found.
  • The marginal Bitcoin might be way over priced forever.

[18]  Interestingly, this implies that BTC only has around 1000x of hyper-growth headroom.

[19]  The distribution chart is not properly normalized, since the distribution is actually linear without the log axis, but it simulates correctly.

[20]  The movement to index funds seems partly rooted in avoiding high mutual fund fees.

[21]  I’m not entirely sure what that hump in the ideal price is doing: it shows up in the other LOESS curves, and persists with changes in the random seed.

[22]  We end up with a different maximum hump with the log and linear graphs: what’s going on here? Keep in mind that LOESS operates on minimizing squared error, and minimizing squared log error is a bit different than minimizing squared error.

Radical Transparency

Nothing that’s been said before, but it didn’t click until I thought about it some more and had an AHA! moment, so I’m doing my own write up.

Let’s say that you’re faced with a Newcomb problem[1].

The basic gist is this: Omega shows up, an entity that you know can predict your actions almost perfectly. Concretely, out of the last million times it has played out this scenario, it has been right 99.99% of the time[2]. Omega presents you with two boxes, of which box A contains $1000000 or nothing, and box B always contains $1000. You have only two choices, take just box A (one boxing) or take both box A and B (two boxing). The twist is that if Omega predicted you would two box, then A is empty, but if it predicted you would one box, then box A contains the $1000000.

Causal decision theory (CDT) is a leading brand of decision theories that says you should two box[3]: once Omega presents you with the boxes, Omega has already made up its mind. In that case, there’s no direct causal relationship between your choice and the boxes having money in them, so the box A already has $1000000 or nothing in it. So, it’s always better to two box since you always end up with $1000 more than you would otherwise.

People that follow CDT to two boxing claim that one boxing is irrational, and that Omega is specifically rewarding irrational people. To me it seems clear CDT was never meant to handle problems that include minds modeling minds: is it also irrational to show up in Grand Central station at noon in Schelling’s coordination problem, despite the lack of causal connection between your actions and the actions of your anonymous compatriot? So you might agree that CDT just doesn’t do well in this case[4] and decide to throw CDT out the window for this particular problem, netting ourselves an expected $999900.10 from one boxing[5], instead of the expected $1099.90 payout from two boxing.

But let’s throw in a further twist: let’s say the boxes are transparent, and you can see how much money is inside, and you see $1000000 inside box A, in addition to the $1000 inside box B. Now do you two box?


I previously thought “duh, of course”: you SEE the two boxes, both with money in them. Why wouldn’t you take both? A friend I respect told me that I was being crazy, but didn’t have time to explain, and I went away confused. Why would you still one box with an extra $1000 sitting in front of you?

(Feel free to think about the problem before continuing.)

The problem was that I was thinking too small: I was thinking about the worlds in which I had both boxes with money in them, but I wasn’t thinking about how often those worlds would happen. If Omega wants to maintain a 99.99% accuracy rate, it can’t just give anyone a box with $1000000. It has to be choosy, to look for people that will likely one box even when severely tempted.

That is, if you two box in clear-box situations and you get presented with a clear box with $1000000 in it, congratulations, you’ve won the lottery. However, people like you simply aren’t chosen often (at a 0.01% rate), so in the transparent Newcomb world it is better to be the sort of person that will one box, even when tempted with arguably free money.


The clear-box formulation makes it even clearer how Newcomb’s problem relates to ethics.

Yes, ethics. Let’s start with what Omega might put in an advertisement:

“I’m looking for someone that will likely one box when given ample opportunity to two box, and literally be willing to leave money on the table.”

Now, let’s replace some words:

“I’m looking for a <study partner> that will likely <contribute to our understanding of the class material> when given ample opportunity to <coast on our efforts>.”

“I’m looking for <a startup co-founder> that will likely <help build a great business> when given ample opportunity to <exploit the business for personal gain>.”

“I’m looking for <a romantic partner> that will likely <be supportive> when given ample opportunity to <make asymmetric relationship demands>.”

In some ways these derived problems are wildly different: these (lowercase) omegas don’t choose correctly as often as 99.99% of the time, there’s an iterated aspect, both parties are playing simultaneously, and there’s reputation involved[6]. But the important decision theory core carries over, and moreover it generalizes past “be nice” into alien domains that include boxes with $1000000 in them, and still correctly decides to get the $1000000.


[1]  I agree for most intents and purposes that the Parfit’s Hitchhiker formulation of the problem is strictly better because it lacks problems that commonly trip people up in Newcomb’s problem, like needing a weird Omega. However, then you get the clear-box problem right away, and I’m going for more incremental counter-intuitive-ness right now.

[2]  Traditional Newcomb problem formulations started with a perfect predictor, but it becomes a major point that people get tripped up over because it’s so damn “unrealistic”. I’m sure no one would object to Omega never losing tic-tac-toe, but no one seems to want to accept a hypothetical entity that can run TEMPEST attacks on human brains and do inference really well. Whatever, it’s ultimately not important to the problem, so it’s somewhat better to place realistic bounds on Omega.

[3]  Notably, Evidential Decision Theory says you should one box, but fails on other problems, and makes it a point to avoid getting news (which isn’t the worst policy when applied to most common news sources, but this applies to all information inflow).

[4]  I haven’t really grokked it, but friends are excited about functional decision theory, which works around some of the problems with CDT and EDT.

[5]  It’s not exactly $1000000, since Omega isn’t omniscient and only has 99.99% accuracy, so we have to take the average of the outcomes weighted by their probability to get the overall expected outcome ($1000000 * 0.9999 + $1000 * 0.0001 = $999900.10).

[6]  Notably, it starts to bear some resemblance to the iterated prisoner’s dilemma.

Tape is HOW expensive?

Maybe you've seen that hard drive prices aren't falling so quickly. Maybe you've seen the articles making claims like "tape offers $0.0089/GB!"[1], looked at recent hard drive prices, and seriously thought about finally fulfilling the old backup adage "have at least 3 backups, at least one of which is offsite" with some nice old-school tape[2].

So you'd open up a browser to start researching, and then close it right afterwards in horror: tape drives prices have HOW many digits? 4? The prices aren't even just edging over $1000, it's usually solidly into the $2000s, or higher. Maybe then you start thinking about just forking all your money to The Cloud™ to keep your data.

But maybe it's worth taking a look and seeing exactly how the numbers work out. As an extreme example, if you can buy a $2000 device that gives you infinite storage, then that is a really interesting proposition[3]. Of course, the media costs for tape aren't zero, but they are cheaper than the equivalent capacity in hard drives. Focusing in, the question becomes: when does the lower cost of each additional tape storage overcome the fixed costs of tape, such that tape systems become competitive with hard drives?


Some background: tape formats are defined by the Linear Tape-Open Consortium (LTO)[4], which periodically defines bigger and better interoperable tape formats, helpfully labled as LTO-N. Each jump in level roughly corresponds to a doubling of capacity, such that LTO-3 contains 400GB/tape while the recent LTO-8 contains 12TB/tape.

And some points of clarification:

  • LTO tapes usually have two capacity numbers; for example, LTO-3 tapes usually advertise themselves as being able to contain 400 or 800GB. If you're lucky, the advertising material will suffix "(compressed)" sotto voce, notifying you that the 800GB number is inflated by some LTO blessed pie-in-the-sky compression factor. Ignore this, just look at the LTO level numbers and their uncompressed capacity.
  • We usually talk about hard drives as a single unit (if you can see the individual hard drive platters, that means you are having a bad problem and you will not be storing data on that drive today), but tape is more closely related to the floppy/CD drives of yore, where media is freely exchangable between drives.

First, I gathered some hard numbers on cost. I trawled Newegg and Amazon for drives and media for each LTO level from 3 to 8, grabbing prices for the first 3 drives from each source and 5 media from each. Sometimes this wasn't possible, like for LTO-8: it's recent, and I could only find 2 different drives. I restricted myself to a handful of pricing examples because I didn't want to gather data endlessly (there are a lot of people selling LTO tapes), but I didn't want to have to sift through a startling lack of data about whether unusually low/high prices were legitimate offers, or indications something was wrong with the seller/device. Whatever, I just got enough data to average it out[5].

Second, I took the average media cost for an LTO level, and how much uncompressed data that level could store, and figured the cost per TB. It's true that some of the later LTO levels should look a lot more discretized: for example, storing 5 and 10 TB on a LTO-8 tape (which can store 12TB) will cost exactly the same, while you'll need to get around twice as many LTO-3 tapes. However, just making everything linear makes analysis a lot easier, and will give approximately correct answers. If it turns out that tape becomes competitive at some small media storage multiple then we can re-run the numbers.

Then, it's just a matter of solving a couple of linear equations, one representing the tape fixed and variable costs, and the other the hard drive costs. To capture some variability in the hard drive cost, I compared the tapes against both a hypothetical cheap $100/4TB drive and a $140/4TB drive[6].

Cost_{Tape} = TapeMedia/TB \cdot Storage + TapeDrive
Cost_{HD} = HD/TB \cdot Storage

Finding the storage point where the costs become equal to each other:

Storage_{competitive} = \frac{TapeDrive}{HD/TB - TapeMedia/TB}

When we solve with some actual data (Google Sheets), we get the smallest competitive capacity going to LTO-5 (1.5TB/tape). And yet, it doesn't look good: if we're comparing against expensive hard drives, we need to be storing ~100TB to become competitive, and if we're comparing against cheap hard drives, we need ~190TB to break even.

So I did some more sensitivity analysis: right now, drives and media are expensive for the recent LTO-7 and 8 standards. Will our conclusions change when LTO-7/8 equipment drop to current LTO-5 prices? Comparing to expensive drives the minimum competitive capacity drops to ~65TB, but that's assuming no further HD R&D, and is still way above the amount of data I will want to store in the near future[7].

In retrospect, it should have been more obvious than I was thinking that the huge fixed costs of tape drives along with non-minuscule variable costs just doesn't make sense for any data installation that doesn't handle Web Scale™ data.

And that's not even fully considering all the weird hurdles tape has:

  • It's unclear whether there are RAID-like tape appropriate filesystems/data structures, especially when you don't have N drives that you can write to at the same time. You can read stories about wrestling with tape RAID, but it doesn't seem to be a feature of the standard Linear Tape File System.
  • Tied into with the previous point, you'll need to swap tapes once one of them fills up. Or if you're trying to get media redundancy, you'll need to do a media swapping dance every time you want to backup. Needing to manage backup media isn't really great when you're trying to make backups so easy they're fire-and-forget.
  • Tape drives are super expensive, which makes them a giant single point of failure. Having redundant drives means you need even more tons of data to stay competitive with normal hard drives.

So we've arrived at the same conclusion as our gut: tapes are overdetermined to be a bad idea for the common consumer. If you can get really cheap clearance/fire sale drives, it might become worth it, but keep in mind the other concerns listed above.

Data and analysis available on Google Sheets.


[1]  Which initially doesn't sound very impressive, given Backblaze's B2 offers $0.005/GB. However, that's an ongoing monthly cost: two months is enough to put tape back into the game, at least according to the linked Forbes article. (I've also remembered more impressive numbers in other articles, but maybe that's just my memory playing tricks on me.)

[2]  Tape has nice properties beyond just having a lower incremental storage cost. It's offline as opposed to constantly online: once you have access to a hard drive, you can quickly overwrite any part of it. Since it isn't possible to reach tapes that aren't physically in the drive, it becomes much more difficult to destroy all your data (say, in a ransomware attack). Tapes are possibly more stable in terms of shelf life, and you can theoretically write to it faster than hard drives.

[3]  If nothing else, owning as many universe breaking/munchkin approved pieces of technology seems like a good policy.

[4]  Sure, you can use VCRs for storage with ArVid, but it is not competitive at all at 2GB on 2 hour tapes. It could probably be made to work better since it uses only 2 luminance levels instead of a full 256+ gradations, but the graniness of home videos doesn't give me hope for much better resolution. Plus, you can do all that extra work, but you'll only end up with capacity comparable to current Blu-Rays. And, where are you going to find a bunch of VCR tapes these days?

[5]  Taking the median is probably better for outlier rejection, and taking the minimum price in each category would probably be a good sensitivity analysis step. I don't believe either choice drastically changes the output for me, since I have relatively small amounts of data to store, but you might want to run the numbers yourself if you have more than, say, 20TB to store.

[6]  It's true that there will likely be some additional hardware costs to actually access more than 12 hard drives, but if nothing else you could go the storage pod route and get 60 drives to a single computer, so we'll just handwave away the extra costs.

[7]  Honestly, I'm not even breaking 1TB at the moment.

2017 Review

If there’s a theme for my 2017, it seems to be FAILURE.

FAILURE at cultivating habits

  • Due to the addition of a morning standup at work, I noticed I was getting in much later than I thought. I could previously pass off some pretty egregious arrival times as “a one time thing” to myself, but not when a hard deadline made it clear that this was happening multiple times a week. So I tried harder to get in earlier, and this made basically no impact.
  • I noticed I was spending a lot of time watching video game streaming; Twitch streams are long, regularly 4 hours, which would just vaporize an evening or an afternoon. It’s not so much that it was a ton of total time, but it was basically a monolithic chunk of time that wasn’t amenable to being split up to allow something to get done each day. I love you beaglerush, but the streams are just too damn long, so I decided I should stop watching game streams. However, I just felt tired and amenable to bending the rules at approximately the same rate as before, so my behavior didn’t really change.
  • I’m a night owl, to the extent that going to sleep between midnight and 3 AM probably covers 95% of my sleeping times, and the rest is heavily skewed towards after 3 AM. So I started tracking when I went to sleep, and had some friends apply social demerits when I went to sleep late. I got mildly better, but was still all over the place with my sleep schedule.

There’s a happy-ish ending for these habits, but first…

FAILURE at meeting goals

A year ago, I decided to have some resolutions. However, I didn’t want them to be year-long resolutions: a year is a long fucking time, and I knew I’d be pretty susceptible to…

  • falling off the wagon and then not getting back on, burning the rest of the year, or
  • mis-estimating how big a year-sized task would be, which would probably only become apparent near the middle of the year. If I got it really wrong, it would be months before I could properly try again.

So similarly to my newsletter tiers, I decided to break the year into fifths (quinters?), and resolved to do something for each of those. I hoped it would be long enough to actually get something done, while being short enough that I could iterate quickly.

So, how did I do?

Quinter 1

  • FAILURE. Finish the hardware part of project noisEE. Design turned out to be hard, did a design Hail Mary that required parts that didn’t get here before the end of the quinter.
  • Stretch FAILURE. Read all of Jaynes’ Probability Theory. Got only ~40% of the way through: it turns out trying to replicate all the proofs in a textbook is pretty hard.
  • FAILURE. Try to do more city exploratory activities. Planning and executing fun/interesting activities was more time consuming than anticipated, and I didn’t account for how much homebody inertia I harbored and how time consuming the other goals would be.
  • SUCCESS. Keep up all pre-existing habits.

Quinter 2

  • FAILURE. Finish project noisEE. It turns out the Hail Mary design was broken, who could have guessed?
  • SUCCESS (mostly). Make a NAS (network attached storage) box. Technically, the wrap up happened the day after the quinter ended.
  • SUCCESS. Keep up all pre-existing habits. Apparently attaining this goal isn’t a problem, so I stopped keeping track of this in future quinters.

Quinter 3

  • SUCCESS/FAILURE. Finish project noisEE, or know what went wrong while finishing. There was a problem with the 2nd Hail Mary, which I debugged and figured out, but it was expensive to fix, so I didn’t stretch to actually fix it. However, the next quinter I didn’t respect the timebox, which was the entire point of this timebox[1].
  • FAILURE. Make a feedback widget for meetups. After designing it, I discovered I didn’t want to spend the money to fabricate the feasible “worse is better” solution.
  • SUCCESS. Spend 20 hours on learning how to Live Forever. Spent 30+ hours on research.

Quinter 4

It’s about this time that I start enforcing goal ordering: instead of doing the easiest/most fun thing first, I would try to finish goals in order, so large and time consuming tasks don’t get pushed to the end of the quinter.

  • SUCCESS. Finish ingesting in-progress Live Forever research. Just wanted to make sure momentum continued from the previous quinter so I would actually finish covering all the main points I discovered I wanted to include.
  • SUCCESS (sad). Fix project noisEE, or give up after 4 hours. I gave up after 4 hours, after trying out some hacks.
  • SUCCESS. Write up noisEE project notes. Surprisingly, I had things to say despite not actually finishing the project, making the notes into a mistakes post.
  • FAILURE. Write up feedback widget design for others to follow. For some reason, I ignored my reluctance to actually build the thing and assumed I would value writing out potentially building the thing instead. Talk about a total loss of motivation.
  • SUCCESS. Write up the Live Forever research results, post about them. Includes practicing presenting the results a number of times.
  • Stretch FAILURE. Prep the meta-analysis checklist. Didn’t have time or the necessary knowledge.

Quinter 5

At this point, I’m starting to feel stretched out, so I started building in break times into my goal structure.

  • SUCCESS. Prepare to present the Live Forever research. Was probably too conservative here, I also planned to actually present, and there weren’t foreseeable things that would have prevented it from happening.
  • FAILURE. Take a week off project/goal work. I thought I would have only 1 week to prepare to present, but it turned into 2-3 weeks and broke up this break week, which was not nearly as satisfying.
  • SUCCESS. Redesign the U2F Zero to be more hobbyist friendly[2].
  • SUCCESS. Do regular Cloud™ backups[3][4].
  • SUCCESS. Take 1 week off at the end of the year. That’s when I’m writing this post!

Miscellaneous FAILURE

There’s so much FAILURE, I need a miscellaneous category.

Speaking of categories, I was organizing a category theory reading group for the first third of 2017 based on Bartosz’s lectures, but eventually the abstractions on abstractions got to be too much[5] and everything else in life piled on, and we ended up doing only sporadic meetups where we sat around confused before I decided to kill the project. In the end, we FAILED to reach functional programming enlightenment.

I’ve even started to FAIL at digesting lactose. It’s super sad, because I love cheese.


Why was there so much FAILURE this year?

Part of it is that I had more things to FAIL at. For example, I wouldn’t previously keep track of how I was doing at my habits, and color code them so I could just look at my tracker and say “huh, there’s more red than usual”. Or, I wouldn’t previously have the data to say “huh, I went to sleep after 3AM 2 weeks in a row”[6].

And in a way, I eventually succeeded: for each of the habits I listed earlier, I applied the club of Beeminder and hit myself until I started Doing The Thing. Does my reliance on an extrinsic tool like Beeminder constitute a moral failing? Maybe, but the end results are exactly what I wanted:

  • I got super motivated to build up a safety buffer to get into work early (even before getting my sleep schedule together!),
  • only broke Twitch abstinence twice since starting in May[7],
  • immediately went from an average sleeping time of 2AM to almost exactly 12:29[8].

And for goals, I opened myself up to FAILURE by actually making fine-grained goals, which meant estimating what I could do, and tracking whether I actually did them. In a way, there are two ways to FAIL: I could overestimate my abilities, or I could simply make mistakes and FAIL to finish what I otherwise would have been able to do. In practice, it seems like I tended to FAIL by overestimating myself.

It’s pretty obvious in retrospect: I started out by FAILING at everything, and then started cutting down my expectations and biting off smaller and smaller chunks until I actually hit my goals. Maybe I should have built up instead of cutting down, but I wanted to feel badass, and apparently the only way you can do that is by jumping in the deep end, so FAILING over and over it is. On the other hand, I think I just got lucky that I stuck it out until I got it together and started hitting my targets, so if you can do it by building upwards, that might work better.

Takeaways

So going forward what are the things I’d keep in mind when trying to hit goals?

  • Think through more details when planning. Saying “I will do all the proofs in Probability Theory” is fine and good, but there’s only so much time, and if you haven’t worked even one of the proofs, then it’s not a goal, it’s a hope and a prayer. Get some Fermi estimates in there, think about how long things will take and what could go wrong (looking at you, hardware turn-around times[9]).
  • If you’ve never done a similar thing before, then estimating the effort to hit a certain goal is going to be wildly uncertain. Pare the goal way down, because there are probably failure modes you’re not even aware of. For example, “lose 5 pounds” would be a good goal for me, because I’ve fiddled with the relevant knobs before and have an idea about what works. “Make a coat from scratch” is a black box to me, hence not a good goal. Instead, I might instead aim for “find all the tough parts of making a coat from scratch”, which is more specific, more amenable to different approaches, and doesn’t set up the expectation of some end product that is actually usable[10].
  • Relatedly, 10 weeks (about the length of a quinter) is not a leisurely long time. Things need to be small enough to actually fit, preferably small enough to do in a sprint near the end of the quinter. I know crunch time is a bad habit carried over from my academic years, but old habits die hard, and at least the things get done.
  • Build in some rest. I pulled some ludicrous hours in the beginning of the year, and noticed as time went on that I seemed less able to put in a solid 16 hours of math-working on the weekends. My current best guess is that I haven’t been taking off enough time from trying to Do The Thing, so I’m building in some break times.
  • Don’t throw away time. You’ll notice that I kept the noisEE dream alive for 4 quinters, each time trying tweaks and hacks to make it work. It’s clear now that this is a classic example of the sunk cost fallacy, and that I either should have spent more time at the beginning doing it right, or just letting it go at that point.

    Another way to throw away time is to try and do things you don’t want to do. My example is trying to make/post the feedback widget, which is pretty simple, but I discovered I couldn’t give any shits about it after the design phase. This isn’t great, because I said I wanted to do the thing, and not doing the thing means you’re breaking the habit of doing the things you’ve set out to do (from Superhuman by Habit). Unfortunately, I’m still not sure how to distinguish when you really want to do something versus when an easily overridden part of yourself thinks it’s virtuous to want to do something, which is much less motivating.

  • Goal hacks might be useful. Looking at it, the main hack I used was timeboxes, which worked sometimes (total longevity research was within a 2x order of magnitude of my timebox estimate) and not so well in others (noisEE overflowed). It seems to be most useful when I’m uncertain how much actual work needs to be done to achieve some goal, but I still want to make sure work happens on it. After working on it for some number of hours, it should be clearer how sizable the task is and it can get a more concrete milestone in the next round.

    Stretch goals might also work, but making things stretch goals seems like a symptom of unwanted uncertainty, and tend to be sized such that they never actually get hit. Unless I find myself stagnating, I plan on just dropping stretch goals as a tool.

  • If you’re not doing the thing because of something low-level like procrastination, a bigger stick to beat yourself with might help. Beeminder is my stick of choice, with the caveat that you need to be able to be honest with yourself, and excessive failure might just make you sad, instead of productive.

    (As a counterpoint, you might be interested in non-coercive ways to motivate yourself, in which case you might check out Malcolm Ocean’s blog.)


Despite all the FAILURE, I think agree with the sentiment of Ray’s post: over the past few years, I’ve started getting my shit together, building the ability to do things that are more complicated than a single-weekend project and the agency to pursue them.

That said, most of the things I finished this year are somewhat ancillary, laying the groundwork for future projects and figuring out what systems work for me. Now that I’ve finished a year testing those systems and have some experience using them, maybe next year I can go faster, better, stronger. Not harder, though, that’s how you burn out.

Well, here’s to 2018: maybe the stage I set this year will have a proper play in the next.


[1]  Thinking about it, timeboxes fall into two uses. You either want to make a daunting task more tractable, so you commit to only doing a small timebox, and if you want to keep going then that’s great! However, the other timebox is used to make sure that some task that would otherwise grow without bound stays bounded. I intended for the noisEE timebox to be used in the bound fashion, so when I kept deciding to keep working on it, that meant the timebox was broken.

[2]  This project does not have a post yet, and may never have one. Hold your horses.

[3]  Offsite backups are an important part of your digital hygiene, and the Butt is the perfect place to put your them.

[4]  If people really want it, I can post about my backup set up.

[5]  Don’t worry, it’s easy, an arrow is like a functor is like an abstract transformation!

[6]  Knowledge is power, France is bacon.

[7]  One of these wasn’t Twitch at all, but a gaming stream I accidentally stumbled across on YouTube, but that still counts.

[8]  HMM I WONDER WHEN I SET MY SLEEP DEADLINE.

[9]  Unless you’re willing to pay out the nose, getting boards on a slow boat from China takes a while.

[10]  The tradeoff is that the 2nd goal is more nebulous: how do you know that you’ve found all the tough parts of making a coat? Maybe timeboxes would help in this case.

Ain’t No Calvary Coming

Epistemic status[1]: preaching, basically. An apology, in both senses[2].

I know my mom reads my blog; hi, mom.

Mothers being mothers, I figure I owe her a sit-down answer to why I’m not Christian, and don’t expect to re-become Christian[3]. Now, I don’t expect to convince anyone, but maybe you, dear reader, will simply better understand.


Let’s start at the end.

Let’s start with the agony of hell, and the bliss of heaven. Sure, humans don’t understand infinities, don’t grasp the eye-watering vastness of forever nor the weight of a maximally good/bad time. Nevertheless, young me had an active imagination, so getting people out of the hell column and into the heaven column was obviously the most important thing, which made it surprising that my unbeliever friends were so unconcerned with the whole deal. I supposed that they already had a motivated answer in place: as heathens, they would be wallowing in unrepentant hedonism, and would go to great lengths to make sure they kept seeing a world free of a demanding and righteous God.

I knew the usual way to evangelize, but it depended to a frustrating degree on the person being evangelized to. It seemed unacceptable that some of my friends might go to hell just because their hearts were never in the right place. Well, what if I found a truly universal argument for my truly universal religion? The Lord surely wouldn’t begrudge guidance in my quest to find the unmistakable fingerprints of God (which were everywhere, so the exercise should be a cakewalk), and I would craft a marvelous set of arguments to save everyone.

Early on, I realized that the arguments I found persuasive wouldn’t be persuasive to the people I wanted to reach: if you assumed the Bible was a historical text you would end up saying “no way, Jesus did all these miracles, that’s amazing!”, but what if you didn’t trust the Bible? I would need to step outside of the assumption that God existed, and then see the way back. Was this dangerous to my faith? Well, I would never really leave: I would just be empathetic and step into my friend’s shoes, to better know how to guide them into the light. And you remember the story about walking with Jesus on the beach? There was no way this could go wrong!

Looking back, I see that my thoughts were self-serving. As a product of both faith and science, I wanted to make it clear that religion could meet science on its own terms and win. If the hierarchy of authority didn’t subordinate science to religion, then…?

So I studied apologetics[4], particularly Genesis apologetics. I made myself familiar with the things like young vs. old earth creationism, the tornado-in-a-junk-yard equivocation, attacks I could make on gradual and punctuated equilibrium[5]. I was even dazzled by canopy theory, where a high-altitude aerial ocean wrapped the planet, providing waters for The Flood and allowing really long lifespans by blocking harmful solar radiation[6]. I went on missions, raising money and overcoming my natural reticence to talk to people about the Good Word. I even listened almost solely to Christian rock music.

Now, I don’t doubt I believed: I felt the divine in retreats and mission trips, me and my brothers and sisters in Christ singing as one[7]. I prayed for guidance, hung on the words of holy scripture, found the words for leading a group prayer, and eventually confirmed my faith. As part of my confirmation, I remember being baptized for the 2nd time in high school[8]: a clear, lazy river had cut a gorge into sandstone, and the sunset lit the gorge with a warm glow. Moments before I went under the water, I thought “of course. How could I doubt with such beauty in front of me?”.

But some of these experiences also sowed the seeds of doubt. Someone asked if I wanted the blessing of tongues: I said yes, thinking a divine gift of speaking more than halting Spanish would be great for my upcoming mission trip. And, how cool would it be to have a real world miracle happen right in front of me‽ Later I tried to figure out if glossolalia was in fact the tongue of angels[9], but I didn’t come up with anything certain, which was worrying. Why were my local leaders enthusiastic about this “gift of tongues”[10], but other religious authorities were against the practice? On a mission trip I told someone I could stay on missions indefinitely (in classic high school fashion, I had read the word “indefinite” a few times and thought it sounded cool) and was brought up short when they responded with skepticism that someone could stay forever; why wouldn’t they stay if the work was righteous, comfortable living be damned? Or I would think about going to seminary instead of college, and wonder if that was God’s plan for me.

How did I know what was right, what was true?

The thing is that I didn’t even begin to know. On my quest for answers, I didn’t comprehend the sheer magnitude of 2000 years of religious commentary[11]. I didn’t grasp how hairy the family tree of Christian sects was, each with their own tweaks on salvation. I read Mere Christianity and a few books on apologetics, and thought it would be enough. I didn’t even understand my enemy at all, refusing to grapple with something so basically wrong as The Selfish Gene. Into this void on my map of knowledge I sailed a theological Columbus, expecting dragons where there was a whole continent of thought.

So the more I learned, the more doubt compounded. When my church split, I wondered why such a thing could happen: were some of the people simply wrong about a theological question? That raised more disturbing questions about how one could choose the truest sect of Protestant Christianity, ignoring “cults” like Mormonism or Catholicism or Eastern Orthodox or even other religions entirely, like Islam (and there are non-Abrahamic religions, too‽). Or, maybe a church split could happen for purely practical concerns, but it was disturbing that such an important event in a theological institution wasn’t grounded in theological conflict: if not a church split, then what should be determined by theology?[12] And, I realized other religions had followers with similarly intense experiences: what set mine apart from theirs?

Again, what did I know, and how did I know it?

Don’t worry, my spiritual leaders would say. God(ot) is coming, just wait here by this tree and he’ll be along any moment now[13].

And maybe God would come, but he would maintain plausible deniability, an undercover agent in his own church. Faith healings wouldn’t do something so visible as give back an arm, just chase away the back pain of a youth leader for a while. My church yelled prayers over a girl with a genetic defect, and the only outcome was frightening her[14]. Demonic possession leading to supernatural acts isn’t a recorded phenomenon, despite the proliferation of cameras everywhere. So the whispers of godhood would always scurry behind the veil of faith whenever a light of inquiry shone on it.

I started refusing to stand during praise. Singing with this pit of questions in my stomach seemed too much like betrayal, displaying to the world smiles and melodies I knew were empty. I sat and thought instead, trying to retrace Kant’s Critique of Pure Reason without Kant’s talent[15]. I simply couldn’t accept the dearth of convincing evidence and simply trust, when all my instincts and training screamed for a sure foundation, when I knew a cosmic math teacher would circle my answer of “yes, God exists!” and scribble in red “please show your work“.

I told myself I would end it in a blaze of glory, pledging fealty to a worthy Lord, or flinging obscenities at the sky and pulpit when they didn’t have the answers. Instead, my search for god outside of god himself petered out under a pile of unanswered questions[16], and I languished in a purgatory of uncertainty. In a way, I was mourning the death of god. It took years, but now I confidently say I’m an atheist.


So that was the past. What about the future?

Sometimes the prodigal son falls on hard times and has to come home; in the case of the church, home has a number of benefits. Peace of mind that everything will turn out okay. A sabbath, if one decides to keep it. A set of meditation-like practices at regular intervals (even in Christianity!). A set of high-trust social circles[17] with capped vitriol (in theory; in practice, see the Protestant Reformation and aforementioned church splits), a supportive community with a professional leader, a time to all feel together. Higher levels of conscientiousnessHigher productivity[18]. The ability to attract additional votes in Congressional races. Chips at the table of Pascal’s Wager[19].

Perhaps most importantly, though, is a sense of hope. How does one have hope for the future when there is only annihilation at the end?

Paul saw the end, a world descending into decadence, a world that couldn’t save itself: hell, given a map, it wouldn’t save itself. Contrary to this apocalyptic vision, scientism[20]/liberalism preaches abundance, the continual development of an ever better world. We took the limits of man and sundered them; we walked on the moon, we eradicated polio, we tricked rocks into thinking for us, and we’ll break more limits before we’re done. Paul was the product of an endless cycle of empires; we’re on a trajectory to leave the solar system[21].

There is light in the world[22], and it is us.

But if the world is simply getting better, then does it matter what I believe? Well, our rise is only part of the story: it took tremendous work to get from where we were to where we are, and the current world is built on the blood of our mistakes[23]. The double-edged sword of technology could easily lop off our hand if we’re not careful. We’ve done some terrible things already, and finding the Great Leap Forward-scale mistakes with our face is hideously expensive.

So progress is possible, but we haven’t won. How do the engineers say it? “Hope is not a strategy.” There ain’t no Calvary coming[24], ain’t no Good King to save us, ain’t no cosmic liquidation of the global consciousness, ain’t no millennium expiration date on suffering. A reductionist scientific world is a cold world without guardrails, with nothing to stop us from destroying ourselves[25]: if we want a happy ending, we’ll need to breach Heaven ourselves, and bowing our heads and closing our eyes in prayer won’t help when we should be watching the road ahead. It’s going to be a lot of hard work, but this isn’t a cause for despair. This is a call to arms.

So in the past, a successful prodigal son may have gone home for a sense of continuity and purpose, a sense of hope beyond the grave. However, now he doesn’t have to. It’s not just about unrepentant hedonism[26]: we’re getting closer to audacious goals like ending poverty, ending aging, ending death. We won’t wait for a bright new afterlife that isn’t coming: we humanists will do our best, and maybe, just maybe, it will be enough.

No heaven above, no hell below, just us. Let us begin.


[1]  Epistemics: the ability to know things. Epistemic status: how confident I am about the thing I am writing about.

[2]  Senses: saying sorry, and in the sense of apologetics or defending a position. Commonly found as the bi-gram “Christian apologetics”.

[3]  I almost didn’t publish this post, figuring I hadn’t heard from my mom about faith-related topics in a while. Then my mom told all my relatives “We are praying for a godly young woman who can bring <thenoviceoof> back to us”, so here we are.

[4]  A defense of the faith, basically, usually hanging around as a bi-gram like “Christian apologetics”. See Wikipedia.

[5]  Standing from where I am, I can see how the books would paint the strengths of science as weakness: “look at how science has been wrong! And then it changed it’s mind, like a shifty con-man!” In this respect, the flip-flopping nature of science journalism in fields like nutrition is Not Helping, a way of poisoning the well of confident proclamations of evidence, such that everyone defaults to throwing up their hands in the face of evidence, instead of actually assessing it.

[6]  In retrospect, I had a thing for weirdly implausible theories: I remember being smitten with the idea that all of physics could be explained by continually expanding subatomic particles, a sort of classical Theory of Everything that no one asked for, with at least one gaping hole you could drive trucks through (hint: how do satellites work?).

[7]  We even cautioned ourselves against “spiritual highs”. We would feel something, but the something wouldn’t always be there, which maybe should have tipped me off about something fishy happening. How do they say it, “don’t get high off your own supply”?

[8]  Many children are baptized soon after birth, and confirmed at some later age when they can actually make decisions. Hmm.

[9]  Now, I know that I could tell by listening for European capitals.

[10]  I didn’t actually get to the point of spewing glossolalia: I could hear my youth group leader’s disappointment that I didn’t quite let myself go while repeating “Jesus, I love you” faster than I could speak. And, finding out that no earthly audience would have understood what I was saying was also a shock, like finding out God solely communicated to people through grilled cheeses.

[11]  Talk about being bad at grasping infinities: I couldn’t even grasp 2000 years. “More things than are dreamt of in your philosophy”, etc.

[12]  The obvious rejoinder is that the church is still an earthly institution, and it’s still subject to mundane concerns like balancing the budget: for every Protestant Reformation grounded in theological conflict, there’s another hundred grounded in conflicts over the size of the choir, all because we live in a fallen world. The general counter-principle is that if there’s no way to tell from the behavior of churches whether we’re in a godly or godless world, then the fact there exists a church ceases to count as evidence.

[13]  The fact that some biblical scholars translate “cross” as “tree” makes me suspicious that Waiting for Godot was in fact making this exact reference.

[14]  I didn’t partake; this was after I started being weirded out by the charismatics.

[15]  I’m disappointed I didn’t throw up my hands at some point and yell “I Kant do it!”

[16]  Sure, there were answers, but they weren’t satisfying. You couldn’t get there from here.

[17]  Of course, the trust comes at a price; I wouldn’t want to be trans in a small tight-knit fundamentalist town.

[18]  It’s not clear from the abstract of the paper, but in Age of Em Robin Hanson cites this paper as showing the religious have higher productivity.

[19]  Mostly not serious, since I would expect a jealous Abrahamic God to throw out any spiritual bookies. Also keep in mind that Pascal’s wager falls apart even with the simple addition of multiple gods competing for faith.

[20]  I am totally aware that scientism is normally derogatory. However, science itself doesn’t require the modes of thought that we normally attribute to our current scientific culture.

[21]  One might worry that we would simply export our age-old conflicts and flaws to the stars, in which case they might become… bear with me… the Sins of a Solar Empire?

[22]  “Run for the mountains!” said Apostle Paul. “It is the dawn of the morning Son!” Then Oppenheimer said “someone said they were looking for a dawn?”

[23]  Sapiens notes “Haber won the Nobel prize in chemistry. Not Peace.”

[24]  I’m sorry-not-sorry about the pun. If you don’t get it, Calvary is the hill Jesus supposedly died on, and “ain’t no cavalry coming” is a military saying: there’s no backup riding in to save the day.

[25]  Nukes are traditional, if less concerning these days. Pandemics are flirting on the edge of global consciousness, AI getting more serious, and meta-things like throwing away our values and producing a “Disneyland without children” are becoming more concerning.

[26]  Just look at what the effective altruists are doing with their 10%.

The Mundane Science of Living Forever


Epistemic Status: timeboxed research, treat as a stepping stone to more comprehensive beliefs. Known uncertainty called out.

Live forever, or die trying!

Previously: Lifestyle interventions to increase longevity @ LessWrong9s of cats.

TLDR?

Yes, Immortality

I wrestled with whether to shoot for a more normal and mundane title, like “In Pursuit of longevity”, but “live a long time!” just doesn’t have the ring that “live forever!” does.

Clarification: I don’t have the Fountain of Youth. I’m relying on the future to do the heavy lifting. Kurzweil’s escape velocity idea is the key idea: we want to live long enough that life expectancy starts increasing more than 1 year per year. Life expectancy is currently stagnant, so we want to live as long as possible to maximize our chances of hitting some sort of transition.

In other words, we need silver bullets to overcome the Gompertz curve, but there are no silver bullets yet, just boring old lead bullets. We’ll have to make our own silver bullet factory, and use the lead bullets to get there.

So, the bulk of this post will be devoted to simply living healthily. A lot of the advice is boring and standard: eat your vegetables, exercise, get enough sleep. However, I wanted to check out the science and see what holds up under (admittedly amateur) scrutiny.

(I’ll be ignoring the painfully obvious things, like not smoking. If you’re smoking, stop smoking[1].)

My process: I timeboxed myself to 20 hours of research, ending in August 2017. First, I looked up the common causes of death and free-form generated possible interventions. Then, I followed the citations in the Lifestyle interventions to increase longevity post and then searched Google Scholar, especially for meta-analyses, and read the studies, evaluating them in a non-rigorous way. I discarded interventions that I wasn’t certain about: for example, Sarah lists some promising drugs and gene therapies but based only on animal studies, where I wanted more certainty. I ended up using 30+ hours, so not everything is exhaustively researched as much as I would like: for example, there was a fair amount of abstract skimming. I did not read every paper I reference end-to-end. On the other hand, many papers were also locked behind paywalls so I couldn’t do much more than that.

This means if you read one of these results and implement it without talking to your doctor about it and bad things happen to you, I will ask you: ARE YOU A SPRING LAMB? WHY THE FUCK ARE YOU DOING THINGS A RANDOM PERSON ON THE INTERNET TOLD YOU TO DO? AND WITHOUT VETTING THOSE THINGS?

Or more concretely: you are a unique butterfly, and no one cares except the medical world. What happens for the faceless statistical masses might not happen for you. I will not cover every single possible interaction and caveat, because that is what those huge medical diagnosis books are for, and I don’t have the knowledge to tell you about the contents of those books. Don’t hurt yourself, ask your doctor.

An example: blood donation

First, I wanted to lead with an example of how the wrong methods can cripple a conclusion and end up with bad results.

Now, blood donation looks like it is very, very good for male health outcomes. From “Blood donation and blood donor mortality after adjustment for a healthy donor effect.” with 1,182,495 participants (N=1,182,495) published in 2015 (note it’s just an abstract, but the abstract has the data we want):

» For each additional annual blood donation, the all-cause mortality RR (relative risk) is 0.925, with a 95% CI (confidence interval) from 0.906 to 0.943[2]. I’ll be summarizing this information as RR = 0.925[0.906, 0.943] throughout the post.

(Unless otherwise stated, in this post an RR measure will refer to all-cause mortality, and X[Y, Z] CI reports will be values followed by 95% confidence intervals. There will also be references to OR (odds ratio) and HR (hazard ratio)).

There’s even a well fleshed out mechanism, where iron ends up oxidizing parts of the cardiovascular system and damaging it, and hence doing regular blood donation removes excess blood iron.

But there are some possible confounders:

  • blood donation carries some of the most stringent health screens most people face, which results in a healthy donor effect,
  • altruism could be correlated with conscientiousness, which might affect health outcomes.

The study cited earlier is observational: they’re looking at existing data gathered in the course of normal donation and studying it to see if there’s an effect. In order to make a blanket recommendation that men should donate blood at some regular interval, what we really want is to isolate the effect of donation by putting people through the normal intake and screening process, and then right before putting the needle in randomize for actually taking the donation or not, or even stick the needle in and not actually draw blood.

(Note that randomization is not strictly better than observational studies: observations can provide insights that randomization would miss[3], and a rigorous RCT might not match real world implementations. Nevertheless, most of the time I want a randomized trial.)

No one had done an RCT (randomized controlled trial) in this fashion, and I expect any such study to have a really hard time passing an ethics board when I get numerous calls to help alleviate emergency blood need at a number of times throughout the year.

However, Quebec noticed that their screening procedures were too strict: a large group of people were being rejected when they were in fact perfectly healthy. The rejection trigger didn’t appear to otherwise correlate with health, so this was about as good a randomized experiment as we were going to get. Their results were reported in “Iron and cardiac ischemia: a natural, quasi-random experiment comparing eligible with disqualified blood donors” (2013, N=63,246):

» Donors vs nondonors, RR = 1.02[0.92, 1.13]

In other words, there was basically no correlation. In fact, in another section of the paper the authors could get the correlation to come back by slicing their data in a way that better matched the healthy donor process.

The usual hallmarks of science laypeople can pick apart aren’t there: the N is large, there’s a large cross-section of the community (no elderly Hispanic women effect) and there’s no way to even fudge our interpretation of the numbers: we’re not beholden to science’s fetish with p=0.05, so failing the 95% CI could be okay if it were definitely leaning in the right direction. But it’s almost exactly in the middle. The effect isn’t there or is so tiny that it’s not worth considering.


So that’s an example of how things can look like great interventions, and then turn out to have basically no effect. With our skeptic hats firmly in place, let’s dive into the rest!

Easy, Effective

Vitamin D

Vitamin D gets the stamp of approval from both Cochrane and Gwern[4]. Lots of big randomized studies have been done with vitamin D supplementation, so the effect size is pretty pinned down.

From “Vitamin D supplementation for prevention of mortality in adults” (2012, N=95,286, Cochrane):

» Supplementation with vitamin D vs none, RR = 0.94[0.91, 0.98]

Another meta-analysis used by Gwern, “Vitamin D with calcium reduces mortality: patient level pooled analysis of 70,528 patients from eight major vitamin D trials” (2012, N=70,528):

» Supplementation with vitamin D vs none, HR = 0.93[0.88, 0.99]

You might think that one side of the CI is pretty bad, since RR = 0.98 means the intervention is almost the same as the control. On the other hand, (1) wait until you read the rest of the post (2) keep in mind that it’s very cheap to supplement vitamin D. Your local drugstore probably has a years worth for $20. In a pinch, more sunlight also works, but if you have darker skin, sunlight is less effective.

If you’re interested, there’s lots of hypothesizing on the mechanisms by which more vitamin D impacts things like cardiovascular health (overview).

(If you want a striking visual example of vitamin D precursors correlating with cancer, there’s a noticable geographic gradient in certain cancer deaths; “An estimate of premature cancer mortality in the U.S. due to inadequate doses of solar ultraviolet-B radiation” (2002) states that some cancers are twice as prevalent in the northern US than the southern. There’s more sun in the south, and sunlight helps synthesize vitamin D. Coincidence?! If you want to, you can see this effect yourself by going to the Cancer Mortality Maps viewer from the National Cancer Institute and taking a look at the bladder, breast, corpus uteri or rectum cancers.)

Difficult, but Effective

Exercise

Exercising is hard work, but it pays off big.

From “Domains of physical activity and all-cause mortality: systematic review and dose–response meta-analysis of cohort studies” (2011, N=unknown subset of 1,338,143[5]):

» Comparing people that get 300 minutes of moderate-vigorous exercise/week vs sedentary populations, RR = 0.74[0.65, 0.85]

Unfortunately, “moderate-vigorous” is pretty vague, and the number of multiple comparisons being made is breathtaking.

MET-h is a unit of energy expenditure roughly equivalent to sitting and doing nothing for an hour. Converting different exercises (or intensities of exercise) to MET-h measures can allow directly comparing/aggregating different exercise data. This also makes it easier to decide exactly what “moderate-vigorous” exercise is, roughly mapping to less than 3 MET/h for light, 3-6 for moderate, and above 6 for vigorous.

With this in mind, we can get a regression seeing how additional MET-hs impact RR. From the previous study (2011, N=unknown subset of 844,026):

» +4 MET-h/day, RR = 0.90[0.87, 0.92] (roughly mapping to 1h of moderate exercise)

» +7 MET-h/day, RR = 0.83[0.79, 0.87] (roughly mapping to 1h vigorous exercise)

There’s a limit, though: exercising for too long, or too hard, will eventually stop providing returns. The same study places the upper limit at around a maximum RR = 0.65 when comparing the highest and lowest activity levels. The Mayo Clinic in “Exercising for Health and Longevity vs Peak Performance: Different Regimens for Different Goals” recommends capping vigorous exercise at 5 hours/week for longevity.

A quick rule of thumb is that each hour of exercise can return 7x time dividends (news article). This sounds great, but do some math: put this return together with the 5 hours/week limit, assume that you’re 20yo and doing the maximum exercise you can until 60, and this works out to adding roughly 8 years to your life (note that the study the rule of thumb is based on (2012) gives a slightly lower average maximum gain, around 7 years). Remember the Gompertz curve? We can huff and puff to get great RRs, and it only helps a bit. Unfortunate.

(While we’re exercising: keep in mind that losing weight isn’t always good: if you’re already at a health weight and start losing weight without intending to, that could be a sign that you’re sick and don’t know it yet (source).)

Other studies I looked at:

Unfortunately, most of these studies are based on surveys, which have the usual problems with self reports. There are some studies based on measuring VO2max more rigorously as a proxy for fitness, except those have tiny Ns, in the tens if they’re lucky (it’s expensive to measure VO2max!).

Diet

Overall, many of these studies are observational and based on self-reports; a few are based on randomized provided food, but the economics dictate they have smaller Ns. I’ve put all the diet-related things together, since in aggregate they are fairly impactful (if difficult to put into practice), but note that some of the subheadings contain less certain results.

Fruit and vegetables

It’s like your childhood authority figures said: eat your vegetables.

From “Fruit and vegetable consumption and mortality from all causes, cardiovascular disease, and cancer: systematic review and dose-response meta-analysis of prospective cohort studies” (2014, N=833,234):

» +1 serving fruit or vegetable/day, HR = 0.95[0.92, 0.98]

Like exercise, fruits/vegetables don’t stack forever either; there’s around a 5 serving/day limit after which effects level off. Still, that adds up to around HR = 0.75, competitive with maximally effective exercise.

Potatoes are a notable exception, having a uniquely high glycemic load among vegetables; this roughly means that your blood sugar will spike after eating potatoes, which seems bad. You can find plenty of debate about whether this is in fact bad[6].

Other reports I looked at:

Red/Processed Meat

You know bacon is bad for you, but… bacon is pretty bad for you.

From “Red Meat and Processed Meat Consumption and All-Cause Mortality: A Meta-Analysis” (2013, N=unknown subset of 1,330,352) effects from both plain red meat (hamburger, steak) and processed red meat (dried, smoked, bacon):

» Highest vs lowest consumption categories[7] for red meat, RR = 1.10[0.98, 1.22]

» Highest vs lowest consumption categories for processed red meat, RR = 1.23[1.17, 1.28]

There isn’t all-cause data I could find on fried foods specifically, but “Intake of fried meat and risk of cancer: A follow-up study in Finland” specifically covers cancer risks (1994, N=9,990):

» Highest vs lowest tetrile fried meat: RR = 1.77[1.11, 2.84]

Note that the confidence intervals are wide: for example, the red meat CI covers 1.0, which is pretty poor (and yet the best all-cause data I could find). If we were strictly following NHST (null hypothesis significance testing), we’d reject this conclusion. However, I’ll begrudgingly accept waggled eyebrows and “trending towards significance”[8].

If you’re paleo, you might not have cause to worry, since you’re probably eating better than most other red meat eaters, but I have no data for your specific situation.

Other reports I looked at:

Fish (+Fish oil)

Fish is pretty good for you! Fish oil might contribute to fish “consumption”.

“Risks and benefits of omega 3 fats for mortality, cardiovascular disease, and cancer: systematic review” (2006, N=unknown subset of 36,913) looked at both fish consumption and fish oil, finding that fish/fish oil weren’t significantly different:

» High omega-3 (both advice to eat more fish, and supplementation) vs low, RR = 0.87[0.73, 1.03]

Note this analysis only included RCTs.

“Association Between Omega-3 Fatty Acid Supplementation and Risk of Major Cardiovascular Disease Events: A Systematic Review and Meta-analysis” (2012, N=68,680) looked only at fish oil supplementation:

» Omega-3 supplementation vs none, RR = 0.96[0.91, 1.02]

Note that both of these results have relatively wide CI covering 1.0. Additionally, the two studies seem to differ on the relative effectiveness of fish oil.

There’s plenty of exposition on mechanisms for why fish oil (omega-3 oil) might help in the AHA scientific statement “Fish Consumption, Fish Oil, Omega-3 Fatty Acids, and Cardiovascular Disease”.

Also make sure that you’re not eating mercury laden fish while you’re at it; just because Newton did it doesn’t mean you should.

Other studies I looked at:

Nuts

This study of 7th Day Adventists by “Nut consumption, vegetarian diets, ischemic heart disease risk, and all-cause mortality: evidence from epidemiologic studies” points in the right direction (1999, N=34,198):

» Eating nuts <1 time/week vs >=5 times/week, fatal heart attack RR ~ 0.5[0.32, 0.75] (estimated from a graph)

However, I don’t trust it. Look at how implausibly low that RR is: eating nuts is better than getting the maximum benefit from exercise? How in the world would that work? Unfortunately, I wasn’t able to find any studies that weren’t confounded by religion, so I just have to stay uncertain for now.

Sleep

We spend a third of our lives asleep, of course it matters. The easiest thing to measure about sleep is the length, so plenty of studies have been done on that. You want to hit a Goldilocks zone of sleep length, not too short or not too long. The literature calls this the aptly named U-shape response.

What’s too short, or too long? It’s frustrating, because one study’s “too long” can be another study’s “too short”, and vice versa.

However, from “Sleep Duration and All-Cause Mortality: A Systematic Review and Meta-Analysis of Prospective Studies” (2010, N=1,382,999):

» Too short (<4-7h), RR = 1.12[1.06, 1.18]

» Too long (>8-12), RR = 1.30[1.22, 1.38]

And from “Sleep duration and mortality: a systematic review and meta-analysis” (2009, N=unknown):

» Too short (<7h), RR = 1.10[1.06, 1.15]

» Too long (>9h), RR = 1.23[1.17, 1.30]

So there’s range right around 8 hours that most studies can agree is good.

You might be fine outside of the Goldilocks zone, but if you haven’t made special efforts to get into the zone, you might want to try and get into that 7-9h zone the studies can generally agree on.

Again, most of these studies are survey based. I can’t find the source, but a possible unique confounder is that sleeping unusually long might be a dependent, not independent variable: if you’re sick but don’t know it, one symptom could manifest as sleeping more.

And, if you get enough sleep but feel groggy, you might want to get checked out for sleep apnea.

Other studies I looked at:

Less Effective

Flossing

The original longevity guide was enthusiastic about flossing. Looking at “Dental Health Behaviors, Dentition, and Mortality in the Elderly: The Leisure World Cohort Study” (2011, N=690), it’s hard not to be:

» Among daily brushers, never vs everyday flossers, HR = 1.25[1.06, 1.48]

Even more exciting is the dental visit results (N=861):

» Dental exam twice/year vs none, HR = 1.48[1.23, 1.79]

However, the study primarily covers the elderly with an average age of 81yo. Sure, one hopes that the effects are universal, but the non-representative population makes it hard to do so. So while flossing looks good, I’m not ready to trust one study, especially when I can’t find a reasonable meta-analysis covering more than a few hundred people.

As a counterpoint, Cochrane looked at flossing specifically in “Flossing to reduce gum disease and tooth decay” (2011, N=1083), finding that there’s weak evidence for reduction in plaque, but basically nothing else.

I’ll keep flossing, but I’m not confident about the impact of doing so.

Other studies I looked at:

Sitting

Sitting down all day might-maybe-possibly be bad for health outcomes.

There are some studies trying to measure the impact of sitting length. From “Daily Sitting Time and All-Cause Mortality: A Meta-Analysis” (2013, N=595,086):

» +1 hour sitting with >7 hours/day sitting, HR = 1.05[1.02, 1.08]

However, the aptly named “Does physical activity attenuate, or even eliminate, the detrimental association of sitting time with mortality? A harmonised meta-analysis of data from more than 1 million men and women” (2016, N=1,005,791, no full text) claims the correlation only holds at low levels of activity: once people start getting close to the exercise limit, this study found the correlation between sitting and all-cause mortality disappeared.

From “Leisure Time Spent Sitting in Relation to Total Mortality in a Prospective Cohort of US Adults” (2010, N=53,440):

» Sitting >6 hours vs <3 hours/day (leisure time), RR 1.17[1.11, 1.24]

Note that this is the effect for men: the effect for women is larger. Also, this study directly contradicts the other study, claiming that sitting time has an effect on mortality regardless of activity level. And who in the world sits for less than 3 hours/day during their leisure time? Do they just not have leisure time?

Again, these studies were survey based.

The big unanswered question in my mind is whether exercising vigorously will just wipe the need to not sit. So, I’m not super confident you should get a fancy sit-stand desk.

(However, I do know that writing this post meant so much sitting that my butt started to hurt, so even if it’s not for longevity reasons I’m seriously considering it.)

Other reports I looked at:

Air quality

Air quality has a surprisingly small impact on all-cause mortality.

From “Meta-Analysis of Time-Series Studies of Air Pollution and Mortality: Effects of Gases and Particles and the Influence of Cause of Death, Age, and Season” (2011, N=unknown (but aggregated from 109 studies(?!))):

+31.3 μg/m3 PM10RR = 1.02[1.015, 1.024]

+1.1 ppm CO, RR = 1.017[1.012, 1.022]

+24.0 ppb NO2RR = 1.028[1.021, 1.035]

+31.2 ppb O3 daily max, RR = 1.016[1.011, 1.020]

+9.4 ppb SO2RR = 1.009[1.007, 1.012]

(I’m deriving the RR from percentage change in mortality.)

By itself the RR increments aren’t overwhelming. But since it’s expressed as an increment, if there are 50 increments present in a normal day that we can filter out ourselves, then that adds up to some real impact. The increments aren’t tiny compared to absolute values, though. For example, maximum values in NYC during the 2016 summer:

PM10 ~ 58 μg/m3

CO ~ 1.86 ppm

NO2 ~ 60.1 ppb

O3 ~ 86 ppb

SO2 ~ 7.3 ppb

So the difference between a heavily trafficked metro area and a clean room is maybe twice the percentage impacts we’ve seen, which just doesn’t add up to very much. Beijing is another story, but even then I (baselessly) question the ability of household filtration systems to make a sizable dent in interior air quality.

There are plenty of possible confounders: it seems the way these sorts of studies are run is by looking at city-level pollution and mortality data, and running the regressions on those data points.

Other studies I looked at:

Hospital Choice

Going to the hospital isn’t great: medical professionals do the best they can, but they’re still human and can still screw up. It’s just that the stakes are really high. Like, people recommend marking on yourself which side a pending operation should be done on, to reduce chances of catastrophic error.

Quantitatively, “A New, Evidence-based Estimate of Patient Harms Associated with Hospital Care” (2013) says that 1% of deaths in the hospital are adverse deaths deaths. However, note that many adverse deaths weren’t plausibly preventable by anyone other than Omega.

If you’re having a high stakes operation done, “Operative Mortality and Procedure Volume as Predictors of Subsequent Hospital Performance” (2006) recommends taking into account a hospital’s historical morbidity rate and volume for that procedure: if you’re getting heart surgery, you want to go to the hospital that handles lots of heart surgeries, and has done so successfully in the past.

Other studies I looked at:

Green tea

Unfortunately, there’s no all-cause mortality data on the impact of tea in general, green tea in particular. We might expect it to have an effect through flavonoids.

As a proxy, though, we can look at blood pressure, where lower blood pressure is better. From “Green and black tea for the primary prevention of cardiovascular disease” (2013, N=821):

» Systolic blood pressure, -3.18[-5.25, -1.11] mmHg

» Diastolic blood pressure, -3.42[-4.54, -2.30] mmHg

There’s a smaller effect from black tea, around half the size.

Cochrane also looked at green tea prevention rates for different cancers. From “Green tea for the prevention of cancer” (2009, N=1.6 million), it’s unclear whether there’s any strong evidence of effect for any cancer, in addition to there being a possible garden of forking paths.

If you’re already drinking tea, like me, then switching to green tea is low cost despite any questions about efficacy.

Borderline efficacy

Baby Aspirin

The practice of taking tiny daily doses of aspirin, mainly to combat cardiovascular disease. From “Low-dose aspirin for primary prevention of cardiovascular events in Japanese patients 60 years or older with atherosclerotic risk factors: a randomized clinical trial.” (2014, N=14,464):

» Aspirin vs none, aggregate cardiovascular mortality HR = 0.94[0.77, 1.15]

That CI width is very concerning; you can cut the data so you get subsets of cardiovascular mortality to become significant, like looking at only non-fatal heart attacks, but it’s not like there’s a breath of correcting for multiple comparisons anywhere, and the study was stopped early due to “likely futility”.

The side effects of baby aspirin are also concerning. Internal bleeding is possible (Mayo clinic article), since aspirin is acting as a blood thinner; however, it isn’t too terrible, since it’s only a 0.13% increase in “serious bleeding” that resulted in hospitalization (from “Systematic Review and Meta-analysis of Adverse Events of Low-dose Aspirin and Clopidogrel in Randomized Controlled Trials” (2006)).

More concerning is the stopping effect. “Low-dose aspirin for secondary cardiovascular prevention – cardiovascular risks after its perioperative withdrawal versus bleeding risks with its continuation – review and meta-analysis” looked into cardiovascular risks when stopping a baby aspirin regime before surgery (because of increased internal bleeding risks), and found that a low single-digit percentage of heart attacks happened shortly after aspirin discontinuation. (I’m having trouble interpreting this report.)

I imagine this is why professionals start recommending baby aspirin to folks above 50yo, since the risks of heart attack start to obviously outweigh the costs of taking aspirin constantly. And speaking of cost: baby aspirin is monetarily inexpensive.

Other studies I looked at:

Meal Frequency

Some people recommend eating smaller meals more frequently, particularly to lose weight, which is tied to health outcomes.

From “Effects of meal frequency on weight loss and body composition: a meta-analysis” (2015, N=unknown):

» +1 meal/day, -0.27 ± 0.11 kg of fat mass

It’s not really an overwhelming result; taking into account the logistical overhead of planning out extra meals in a society based on 3 square meals a day, is it really worth it to lose maybe half a kilogram of fat?

Other studies I looked at:

Caloric Restriction

Most longevity folks are really on board the caloric restriction (CR) train. There’s an appealing mechanism where lower metabolic rates produce fewer free radicals to damage cellular machinery, and it’s the exact amount of effort that one might expect from a longevity intervention that actually works.

A common example of CR is the Japanese Ryukyu islands, where there are a surprising number of really old people, who eat a surprisingly low number of calories. However, say it with me: con-found-ed to he-ll! The fact that a single isolated subsection of a single ethnic group have a correlation between CR and longevity doesn’t make me confident that I too can practice CR and tell death to fuck off for a few more years.

So we want studies. Unfortunately, most humans fall into the state of starving and lacking essential nutrients, or having enough calories and nutrients, but almost never the middle ground of having too few calories but all the essential nutrients (2003, literature review). Then there’s the ethics of getting humans to agree to a really long study that controls their diet, so let’s look at animal studies first.

However, different rhesus monkey studies give different answers.

» From “Impact of caloric restriction on health and survival in rhesus monkeys from the NIA study” (2012, N=unknown, no full text), there was no longevity increase from young or old rhesus monkeys.

» However, from “Caloric restriction delays disease onset and mortality in rhesus monkeys” (2009, N=76), there was a 30% reduction in death over 20 years.

Thankfully they’re both randomized, but it doesn’t really help when they end up with conflicting conclusions. You’d hope there would be better support even in animal models for something that should have huge impacts.

What else could we look at? We’re not going to wait for an 80-year human study to finish (the ongoing CALERIE study comes close), so maybe we could look at intermediate markers that are known to have an impact on longevity and go from there.

A CALERIE checkpoint study, “A 2-Year Randomized Controlled Trial of Human Caloric Restriction: Feasibility and Effects on Predictors of Health Span and Longevity” (2015, N=218), looks at the impact of 25% CR on blood pressure:

» Mean blood pressure change, around -3 mmHg (read from a chart)

Pretty good, but that’s also around the impact of green tea. Then, there’s the implied garden of forking paths bringing in multiple comparisons, since the study in the same cluster looks at multiple types of cholesterol and insulin resistance markers.

Finally, there’s the costs: you have to exert plenty of willpower to actually accomplish CR. For something with such large costs, the evidence base just isn’t there.

Chocolate

Chocolate has some impact on blood pressure. “Effect of cocoa on blood pressure” (2017, N=1804, Cochrane) finds that eating chocolate lowers your blood pressure:

Systolic blood pressure, -1.76[-3.09, -0.43] mmHg

Diastolic blood pressure, -1.76[-2.57,-0.94] mmHg

However, if you’re normotensive then there’s no impact on blood pressure, and only taking into account hypertensives the effect jumps to -4 mmHg. Feel free to keep eating your chocolate, but don’t expect miracles.

Social Interaction

Having a social life looks like a really great intervention.

From “Social Relationships and Mortality Risk: A Meta-analytic Review” (2010, N=308,849):

» Weaker vs stronger relationships, OR = 1.50[1.42, 1.59]

And from “Social isolation, loneliness, and all-cause mortality in older men and women” (2013, N=6500):

» Highest vs other quintiles of social isolation, HR = 1.26[1.08, 1.48]

And from “Marital status and mortality in the elderly: A systematic review and meta-analysis” (2007, N>250,000, no full text):

» Married vs all currently non-married, RR = 0.88[0.85, 0.91]

You can propose a causal mechanism off the top of your head: people with more friends are less depressed which just has good health outcomes.

However, the alarm bells should be ringing: is the causal relationship backwards? Are healthier people more prone to socializing? Do the confounders never end? The kicker is that all these studies are looking at the elderly (above 50yo at least), which reduces their general applicability even more.

Other studies I looked at:

Cellphone Usage

Remember when everyone was worried that chronic cellphone usage was going to give us all cancer?

Well “Mobile Phone Use and Risk of Tumors: A Meta-Analysis” (2008, N=37,916) says it actually does:

» Overall tumor, OR = 1.18[1.04, 1.34]

» Malignant tumor, OR = 1.00[0.89, 1.13]

Since we’re worried about malignant tumors, it’s hard to say we should be worried by cellphones.

Other studies I looked at:

Unproven

Confusing thirst with hunger

Some people recommend taking a drink when you feel hungry, the idea being that thirst sometimes manifests as hunger, and you can end up eating fewer calories.

Unfortunately, I couldn’t find any studies that tried to look into this specifically: the closest thing I found was “Hunger and Thirst: Issues in measurement and prediction of eating and drinking” (2010) which reads like a freshman philosophy paper, and “Thirst-drinking, hunger-eating; tight coupling?” (2009, N=50?) which fails to persuade me about… anything, really.

Stress Reduction in a Pill

There are some “natural” plants rumored to have stress reduction effects, Rhodiola rosea and Ashwagandha root.

Meta-analysis on Rhodiola, “The effectiveness and efficacy of Rhodiola rosea L.: A systematic review of randomized clinical trials” (2011, N=unknown) found that Rhodiola had effects on something, but the study was basically a fishing expedition. Even the study name betrays that it doesn’t matter what it’s effective at, just that it’s effective.

Another meta-analysis, “Rhodiola rosea for physical and mental fatigue: a systematic review” (2012, N>176) looked specifically at fatigue and found mixed results.

Meta-analysis on Ashwagandha, “Prospective, Randomized Double-Blind, Placebo-Controlled Study of Safety and Efficacy of a High-Concentration Full-Spectrum Extract of Ashwagandha Root in Reducing Stress and Anxiety in Adults” (2012, N=64) found reductions in self-reported stress scales and cortisol levels (and with RCTs!).

Look, the Ns are tiny, and the studies the meta-analyses are based on are old, and who knows if the Russians were conducting their side of the studies right (Rhodiola originated in Russia, so many of the studies are Russian).

I’m including this because I got excited when I saw it in the original longevity post: stress reduction in a pill! Why do the hard work of meditation when I could just pop some pills (a very American approach, I know)? It just doesn’t look like the evidence base is trustworthy, and my personal experiences confirm that if there’s an effect it’s subtle (Whole Foods carries both Rhodiola and Ashwagandha, so you can try them out for yourself for like $20).

Other studies I looked at:

Water Filters

Unfortunately, there’s basically no research on health effects from water filtration in 1st world countries above and beyond municipal water treatment. Most filtration research is either about how adding any filtration to 3rd world countries has massive benefits, or how bacteria can grow on activated carbon granules. Good to know, but on reflection did we expect bacteria to stop growing wherever it damn well pleases?

So keep your Brita filter, but it’s not like we know for sure whether it’s doing anything either. Probably not worth it to go out of your way to get one.

Hand sanitizer

So I keep hand sanitizer in multiple places in my apartment, but does it do anything?

I only found “Effectiveness of a hospital-wide programme to improve compliance with hand hygiene” (2000, N=unknown), which focused on hospital health outcomes impacted by hand washing adherence. First, not all doctors wash their hands regularly (40% compliance rates in 2011) (scholarly overview), which is worrying. Second, there’s a positive trend between hand washing (including hand sanitizers) and outcomes:

» From moving 48% hand washing adherence to 66%, the hospital-wide infection rate decreased from 16.9% to 9.9%.

However, keep in mind that home and work are usually less adverse environments than a hospital; there are fewer people with compromised immune systems, there are fewer gaping wounds (hopefully). The cited result is probably an upper bound for us non-hospital folk.

(There’s also this cute study: hand sanitizer contains chemicals that make it easier for other chemicals to penetrate the skin, and freshly printed receipts have plenty of BPA on the paper. This means that sanitizing and then handling a receipt will lead to a spike of BPA in your bloodstream. I presume that relative to eating with filthy hands the BPA impact is negligible, but damn it, researchers are doing these cute small scale studies instead of the huge randomized trials I want.)

Other studies I looked at:

Doctor visits

Should you visit your doctor for a annual checkup? My conscientious side says “of course”, but my contrarian side says “of course not”.

Well, “General health checks in adults for reducing morbidity and mortality from disease” (2012, N=182,880, Cochrane) says:

» Annual checkup vs no exam, RR = 0.99[0.95, 1.03]

So basically no impact! Ha, take that, couple hour appointment!

However, The Chicago Tribune notes some mitigating factors, like the main studies the meta-analysis is based on are old, like 1960s old.

Metformin

I didn’t look at metformin in my main study period: I knew it had some interesting results, but it also caused gastrointestinal distress, better known as diarrhea. It brings to mind the old quip: metformin doesn’t make you live longer, it just feels like it[9].

However, while I was reading Tools of TitansDominic D’Agostino floated an intriguing idea: he would titrate the metformin dose from some tiny amount until he started exhibiting GI symptoms, and then dialed it back a touch. I don’t think people have started even doing small scale studies around this, but it might be worth looking into.

Other

There’s some stuff that doesn’t have a cost-benefit calculation attached, but I’m including anyways. Or, there are things that won’t help you, but might help the people around you.

CPR

From “Effectiveness of Bystander-Initiated Cardiac-Only Resuscitation for Patients With Out-of-Hospital Cardiac Arrest” (2007, N=4902 heart attacks):

» Cardiac-only CPR vs no CPR, OR 1.72[1.01, 2.95]

So the odds ratio looks pretty good, except that CI is really wide, and the in absolute terms most people still die from heart attacks: administering CPR raises the chances of survival from 2.5% to 4.3%. So, spending more than a few hours practicing CPR is chasing some really tail risks[10].

However, have two people in your friend group that know CPR, and they can provide a potential buff to everyone around them (two, because you can’t give CPR to yourself). In a similar vein, the Heimlich maneuver might be good to know.

Other studies I looked at:

Smoke Alarm testing

Death by fire is not super common. That said, these days it’s cheap to set up a reminder to check your alarm on some long interval, like 6 months.

Quikclot

It’s unlikely you’ll need to do trauma medicine in the field, but if you’re paranoid about tail risk then quikclot (and competitors) can serve as a buttress against bleeding out. Some folks claim that tourniquets are better, but the trauma bandages are a bit more versatile, since you can’t tourniquet your chest.

It’s not magical: since the entire thing becomes a clot, it’s basically just moving a life threatening wound from the field into a hospital. Also make sure to get the bandage form, not the powder; some people have been blinded when the wind blew the clot precursor into their eyes.

Cryonics

Of course, this post wouldn’t be complete without a nod to cryonics. It’s the ultimate backstop. If there all else fails, there’s one last option to make a Hail Mary throw into the future.

Obviously there are no empirical RR values I can give you: you’ll have to estimate your own probabilities and weigh your own values.

WTF, Science?

The overarching story is that we cannot trust anything, because almost all the studies are observational and everything could be confounded to hell by things outside the short list that every study incants they controlled for and we would have no idea.

Like Gwern says, even the easiest things to randomize, like giving people free beer, aren’t being done, much less on a scale that could give us some real confidence.

There is too little disregard for the garden of forking paths in this post-replication crisis world, and many studies are focused on subgroups that plausibly won’t generalize (ex. the elderly).

And what’s up with the heterogeneity in meta-analyses? If every single analysis results in “these results displayed significant heterogeneity”, then what’s the point? What are we doing wrong?

What am I doing?

Maybe you want to know what me myself am doing; I suspect people would be interested for the same reason journalists intersperse a perfectly good technical thriller with human interest vignettes, so here:

  • Continuing vitamin D supplementation, and getting a couple minutes of sun when I can.
  • Making an effort to eat more vegetables, less bacon/potatoes (to be honest, I’m more optimistic about cutting out the bacon than potatoes), more fish, and replacing more of my snacking with walnuts.
  • Keep taking fish oil.
  • Exercise better: I haven’t upped the intensity of my routine in a while. I probably need some more aerobic work, too.
  • Tell myself I should iron out my sleep schedule.
  • Get myself a standing desk for home: I have a standing desk at work, so I’m already halfway there.
  • Buy an air filter: low impact, but whatever, gimmie my percentage points of RR.
  • Switch from drinking black tea to green tea.
  • Cut back on donating blood. I’ll keep doing it because it’s also wrapped up in “doing good things”, but I was doing it partly selfishly based on the non-quasi-randomized studies. Besides, I have shitty blood.

TLDR

Effective and certain:

  • Supplement vitamin D.

Effective, possibly confounded:

  • Exercise vigorously 5 hours/week.
  • Eat more fruits and vegetables, more fish, less red meat, cut out the bacon.
  • Get 7-9 hours of sleep.

Less effective, less certain:

  • Brush your teeth and floss daily.
  • Try to not sit all day.
  • Regarding air quality, don’t live in Beijing.

There is also a presentation.


[1]  If you need me to go through the science of smoking, then let me know and I can do so: I mostly skipped it because I’m already not smoking, and the direction of my study was partly determined by what could be applicable to me. As a non-smoker, I didn’t even notice it was missing until a late editing pass.

[2]  The abstract reports results in terms of percentage mortality decrease, which I believe maps to the same RR I gave.

[3]  If I remember correctly, Due Diligence talks about this.

[4]  The Cochrane Group does good, rigorous analysis work. Gwern is an independent researcher in my in group, and he seems to be better at this sort of thing than I am.

[5]  Annoyingly, some meta-analyses don’t report the aggregate sample sizes for analyses that only use a subset of the analyzed reports.

[6]  For example, Scott’s review of The Hungry Brain points out that some people think potatoes are great at satiating appetites, so it might in fact work out in favor of being okay.

[7]  These category comparisons are loose, since some studies will report quartiles and others will use tertiles, so the analysis simply goes with the largest effect possible across all studies.

[8]  Yes, it’s fucking stupid I have to stoop to this.

[9]  Originally “marriage doesn’t make you live longer, it just feels like it.”

[10]  I know, it’s ironic that I’m calling this a tail risk, when we’re pushing something as stubborn as the Gompertz curve.

9s of Cats


Epistemic status: value judgement.

The internet has a lot of cat pictures.

Let's say I upload a cat picture to Amazon's Simple Storage Service (S3). As of writing, their marketing materials claim that a stored object is 99.999999999% likely to stay securely stored in a year, which translates into a 50% chance of losing a given cat picture once every 70 billion years years[1]. In storage/networking jargon, this is 11 9's of durability, a sort of fast n' dirty logarithmic shorthand for stating how reliable a service is found by counting the 9s in a percentage. For example, 99.9% would be 3 9's.

This doesn't mean that Amazon is super optimistic and thinks the chance of total nuclear war or perfect storm pandemic is some tiny percentage. It's just that if civilization does collapse then former customers would want Amazon warriors over Amazon refunds. Conditional on the continued existence of Amazon, the business, they'll probably keep doing crazy replication schemes[2] to maintain those crazy guarantees.

However, smaller apocalypses will leave Amazon broken while humanity lives on[3]. In these futures, I could easily imagine children gathered around a working fin de sicle computer wondering why in the world this cat looks so grumpy?

So certain cat photos might in fact have 11 9's of durability, enough to live 9 lives over and over. What about humans?

What about humans? Looking at the 2014 CDC death rates, there are 823.7 deaths/100,000 people, working out to a 99.18% annual durability for a randomly selected human (American), for a measly 2 9s of durability. If you show someone a cat picture when they are 12, at best you can expect them to hold onto that memory with 2 9s of durability, because after that they are likely dead[4].

Cat pictures hold together with 11 9s: humans hold together with 2 9s.

It seems a little incongruous, yes? One is a chuckle-worthy image, and the other is a person.

I mean, there is a good reason, one is much more complicated than the other. Grumpy Cat herself will die far before her image does (maybe that's why she's grumpy?). We can barely simulate nematode neural systems, and even simply finding a human's brain connectome (connection graph) is still prohibitively expensive, much less running the entire graph forwards in time[5].

Instead of doing the naive thing easily suggestible by the S3 analogy and trying to scan people to replicate them across availability zones[6], we could simply extend their lives. For example, we boosted the general US life expectancy from 40 years to 80 years since the early 1800s. But note:


y(t) = a \cdot e^{-b \cdot e^{-c \cdot t}}

It's not even "fuck the natural logarithm", it's "fuck the double logarithm". If we find some fantastic intervention in a pill that reduces our relative risk of death by half without any side effects, that halves the b value, which means this only moves the curve over a few years[10]:

A graph showing two curves, one with normal humans,
and the other with humans that have half the relative risk (RR)

We'll somehow need to invalidate this model with our mental fists.

(At this point, I should point out that there are some people working on the problem with an eye towards halting or reversing aging[11], like The SENS Foundation and The Methuselah Foundation. They are nonprofits, and could always use more money: if nothing else, they could make a bigger incentive prize of the XPrize sort.)

But I didn't write this post to complain about our problems, I wrote this post because:

  1. coining "9s of cats" was too tempting to pass up.
  2. consider this a weak post-pre-registration[12] of an informal study I did for well supported longevity actions we common folk can do today. Sure, the things we do are still subject to the steep demands of the Gompertz curve, but we want to maximize our chances of hitting Kurzweil's escape velocity if/when it happens.

Stay tuned.

Previously.


[1]  Note that this is for a specific object, and not for a set of objects. If you have 10 trillion objects, you might see one of them go missing in a year, and that would be within the guarantee.

[2]  If you want an example of the sorts of replication large tech companies use, you can check out Facebook's blob store.

[3]  Note that while I work for a competitor of Amazon, I don't intend for this to be a pleasant daydream, but a nightmarish one. Also, it bears repeating that I do not presume to speak for my employer, etc etc.

[4]  This doesn't even include things like Alzheimers, which destroy the people without destroying their bodies.

[5]  Contrast this with genome sequencing costs, which have dived faster than exponentially. Today, you can get your genome sequenced for around $1k (the cost is sitting behind some cost request, but I've heard from biologists that Illumina whole-genome sequencing is around that much. Veritas Genetics also has a quote for around that much). It's possible that high resolution scanning technology will hit a similar trend, but it might not.

[6]  Availability zones: broad sectors with non-overlapping support, the theory being that bringing down one zone doesn't bring down the others. Concretely, it would be harder to kill you for good if you had copies living in both Europe and Asia.

[7]  Quip appropriately lifted from Ra, the Space Magic chapter.

[8]  To be fair, that's going from "lol leeches for everyone" to "well, let's scrape your bones out and put them in another person, and hey presto, they stopped dying!".

[9]  More by Gwern on his longevity page.

[10]  Graphic generated using an R+ggplot2 script, available as a Github Gist. I use the same curve that Gwern does circa 2017.

[11]  There are arguments against extending human lifespans, like overcrowding, but that's silly. Droning on about the sanctity of death because it's the Dark Ages is fine, but defaulting to death because oh no there are problems to overcome is a damn defeatist attitude. If you haven't read Bostrom's The Fable of the Dragon Tyrant, it's a gentle storytale introduction to non-deathism.

[12]  A pre-registration, so I can't just sweep things under the table, and weak, because I've already done the bulk of the research and analysis.

Subdermal Scientific Delivery

Epistemic status: crap armchair theorizing.

PutANumOnIt points out that psychology is broken. Having read Robyn Dawes’ House of Cards and Andrew Gelman’s post on the replication crisis, I agree with him, it is kind of crappy that it’s been years since the replication crisis and still nothing seems to have changed.

However, I disagree with the shape of his reaction, both online and in person (I was in the same room with him and the psychology student). What he said was true and necessary, but his frustration wasn’t usefully channeled. I think that adding the 3rd Scott Alexander comment requirement[1], kindness, would have at least very minutely helped move us towards a world of better science.

Why kindness? Well, how could we fix psychology without it? Some fun ideas:

  • The government could set higher standards for the science it funds.
  • Scientific journals could uphold higher standards.
  • The universities that host the psychology professors could start demanding higher standards from the professors, like for granting tenure.
  • The APA (American Psychological Association) could publish guidelines pushing for higher standards[2].
  • Psychology curriculum writers could emphasize statistics more.

If we could do any of these with a wave of a wand, any one of these would… well, wouldn’t end the crisis, but it would push things in the right direction.

However, we don’t have a wand, so I’m not confident any of these are going to happen with the prevailing business as usual.

  • The journals, APA, and curriculum writers solutions are recursive: the psychologists themselves are integral parts of those processes. It’s possible to push on non-recursive parts, like getting a key textbook writer to include an extra chapter on probabilistic pitfalls[3], but trying to hook a key figure is difficult[4].
  • Curriculum writers set their sights on the next generation, not the current one. It seems like the curriculum is already slowly changing, but waiting for the entire field to advance “1 death at a time” is kind of slow.
  • The government is going to move slowly, and special interests like pharmaceutical companies invested in softer standards would throw up (probably non-obvious) roadblocks. Also, the APA has much more cachet with the government than me or Andrew Gelman. David and Goliath is a morality tale, not a blueprint for wild success.

    Or, more concretely, how do you get psychologists to not tell their patients to call their congressmen, because they’re being put out of a job as collateral damage in a campaign for better science?[5]

And notice that these all sum up large efforts: what does it mean to convince the government to have higher standards for the science it funds? It’s an opaque monolithic goal with an absolute ton of moving parts behind the scenes, most of which I’m blissfully ignorant of. These actions are so big that it’s easy to give in to the passive psychological warfare (ha!) and give up. It’s The Art of War, convincing people to accept defeat without even fighting by just impressing them with the apparent momentum of the problem. What could one do to turn that juggernaut?

In contrast, I want to focus on the opposite end of the scale; what if we tried to convince our lone psychology graduate student to consider better statistical methods?


But how? If you squint hard enough, it’s a sort of negotiation: we want the student to spend a non-trivial amount of time learning lots of statistics, while the student probably does not want to spend their Friday evenings reading about the how to choose Bayesian priors. We need to convince the student that they should care, if not on Friday evening, then sooner than later.

Let’s borrow some ideas from the nauseatingly self-help book “Getting Past No”:

  1. “Go to the balcony”: make sure to step back and separate the frustration at poor science from the goal of getting better science.
  2. “Step to their side”: I imagine the psychologists would like to do good science, to take pride in their work and have it stand the test of time. However, just telling someone that there’s a replication crisis isn’t helping them deal with it, it’s putting yet another item on their stack full of things all clamoring for their attention while seeming vaguely negative. And how does it go? “No one ever got fired for choosing <field standard here>”. We will want something more positive…
  3. “Build them a golden bridge”: at the very least, we need to make it easy to use the better statistical methods[6], and offer support to those that are interested. Even better would be demonstrating that the methods we’re offering are better than the old and tired methods they’re using: for example, Jaynes recounts a story in “Probability Theory”, where geological scientists accused him of cheating because the Bayesian methods he used simply could not have been that good.

You’ll note that this is super abstract and not at all a blow-by-blow playbook for convincing anyone about scientific processes. Indeed, the entire process of starting with convincing a single graduate student is to figure out what the actual playbook is. Like in startup parlance, “do things that don’t scale”: even if I directly convinced 1 psychologist a day to use better statistical methods, America mints more than 365 psychologists in a year. But, if I instead found a message that tightly fit the profession and then posted that on the internet, there would be a chance that could take off. (More on this in the Appendix.)

At some point, it’s not enough to have a message that can convince graduate students: if we want to have an impact on timescales shorter than a generation, we’ll have to solve the hard problem of changing a field while the most of the same people are working in it. So, an equally hand-wavey game plan for that scenario:

  1. Ideally, get one of their graduate students on board to provide trusted in-house expertise, and to find out what sorts of problems the research group is facing.
  2. Convince the local statistics professor to endorse you: that way, you can get past the first “this guy is a crank” filters.
  3. (¿¿¿) Somehow convince the professor to consider your methods, who probably wants to work more on his next grant application and less on learning arcane statistics. Apply liberal carrot and stick[7] to refocus their attention on the existential threat slowly rolling towards them. (???)

I expect every community organizer to roll their eyes at my amateur hour hand waving around “and then we convince person X”. However, I am confident we do need to do the hard ground work to make the revolution happen.

In the end, I think we hope to make something like one of the following happen:

  • virally spread a 80/20 payload of better statistics among psychologists, and get a silent super majority of psychologists that all adhere on the surface to current institutional norms, but who eventually realize “wait, literally all my colleagues also think our usage of p values is silly” and a fast and bloodless stats revolution can happen.
  • move the psychology Overton window enough that an internal power struggle to institute better practices can plausibly succeed, led by psychologists that want to preserve the validity of their field.
  • in the course of convincing the entire field, figure out how to actually “statistical spearphish” up and coming field leaders, so they can save their field from the top[8].

So when I heard Jacob express a deep frustration to the student conveying “your methods are bad” (true) which was easily interpretable as “you should feel bad” (probably not intended), I saw the first step of the above revolution die on the vine. Telling people to feel bad (even unintentionally) is not how you win friends and influence people! To head off an obvious peanut gallery objection, it’s not like we’re allowing bad epistemology to flourish because oh no someone might find out they were wrong and feel bad so we can’t say anything ever. It is more pragmatic: compare trying to force someone to accept a new worldview, versus guiding them with a Socratic dialog to the X on the map so they unearth the truth themselves.

Maybe the common community that includes Jacob and I don’t want to devote the absolutely ludicrous resources needed towards reforming a field that doesn’t seem to want to save itself[9]. At the very least, though, we should try not to discourage those that come seeking knowledge, as our graduate student was.

And the alternative? That’s easy, we don’t do anything. Just let psychology spew bad results and eventually crash and bleed out, taking lent scientific credibility with it. I don’t think the field is too big to fail, but it sure would be inconvenient if it did.

(And since you’re the sort of person that reads this blog, then I might add that destroying a field focused on human-level minds right as a soft AI take off starts producing human-level complexity minds might be a poor idea[10].)

However, let’s raise the stakes: what if it’s not just psychology? I have a friend working in another soft-ish science field, closer to biology, and he reports problems there too. An upcoming post will in passing point out some problematic medical research. Again, I don’t think destroying psychology would bring down the entire scientific enterprise, but I do think destroying all fields as soft as biology would. So saving psychology is a way to find out if we can save science from statistical post-modernism; as the song goes “if you can make it there, you can make it anywhere”.

Maybe I’ll take up the cause. Maybe not[11]. If I do, more on this later.


Appendix: Other Actions, Other Considerations

Not everything is trying to convince people in 1-on-1 chats or close quarters presentations/workshops. Especially once we figure out what the scientists need and how we can get it to them, I think we’ll need:

  • better statistical material support geared towards working scientists. Similar to the website idea floated earlier in the post, having a central place that has all the practical wisdom will make it easier to scale.
  • provide better statistical packages that aren’t arcane and insane (looking at you, R), and do The Right Thing by default, and warn when you’re doing the wrong thing and why it is wrong. However, this will likely end up being in the existing statistical ecosystems like R, since that’s where the users are. Similar to the previous point, this also includes better tutorial and usage support.

Other things would help, but are harder in ways I don’t even know how to start solving:

  • Like House of Cards recommends, we could not require therapists to do original research. That’s like requiring medical students to get unrelated undergrad degrees for a touch of class around the office: expensive, inflating the need for positive research, and of dubious help. Yes, reducing credentialism is difficult.
  • Stop requiring positive results for publication. This is the problem for most scientific fields, because you need publication to become a PhD, and you need positive results to publish because negative results aren’t exciting. So you get p-hacking to get published, because you’ve told people “lol sink or swim” and by god they’re going to bring illegal floaties.
  • Or, give negative replications more weight/publication room. This would have the negative effect that it’ll probably increase animosity in the field, and professionals don’t want that, so there will still be costs to overcome. Changing the culture to detach yourself from your results will be… difficult.

[1]  Scott Alexander’s blog, Slate Star Codex, has a comment policy requiring comments be true, necessary, or kind, with at least two of those attributes.

[2]  Sure, guidelines don’t cause higher standards directly, but it makes it much easier to convince people that pay attention, especially those that aren’t already entrenched.

[3]  This specific strategy is additionally prone to failure since teachers pick and choose what material to use from the textbook buffet, so a standalone section on statistics would likely go unused. An entire textbook using unfamiliar statistics would be an even tougher sell.

[4]  In case it’s not clear: trying to convince key figures that they should do a thing is difficult, because if they were easy to convince, then every crank that walked into their office could have the key figure off on their own personal goose chase.

[5]  Yes, there isn’t a 1-to-1 mapping between demanding better statistics and putting therapists out of their job. However, if things have to become legislative, then it seems likely the entire field of psychology will be under attack, with non-trivial airtime going towards people with an axe to grind about psychology. And heaven forbid it become a partisan issue, but when has heaven ever cared?

[6]  In this regard, Stan by Andrew Gelman and co looks pretty interesting, even if I have no idea how to use it.

[7]  Yes, carrot and stick. We’ll need to introduce discussion of negative consequences sooner or later: if not the future destruction of science, then maybe something about their legacy or pride, or whatever.

[8]  Unlikely for the same reasons included in a previous footnote, but included for completeness.

[9]  The field as a whole, not counting individual people and groups.

[10]  A thousand and one objections for why this is a bad analogy spring to mind, but I think we could agree that conditional on this scenario, it couldn’t be worse to have a functioning field of psychology than not.

[11]  Remember, aversion to “someone has to, and no one else will”.