2017 Review

If there’s a theme for my 2017, it seems to be FAILURE.

FAILURE at cultivating habits

  • Due to the addition of a morning standup at work, I noticed I was getting in much later than I thought. I could previously pass off some pretty egregious arrival times as “a one time thing” to myself, but not when a hard deadline made it clear that this was happening multiple times a week. So I tried harder to get in earlier, and this made basically no impact.
  • I noticed I was spending a lot of time watching video game streaming; Twitch streams are long, regularly 4 hours, which would just vaporize an evening or an afternoon. It’s not so much that it was a ton of total time, but it was basically a monolithic chunk of time that wasn’t amenable to being split up to allow something to get done each day. I love you beaglerush, but the streams are just too damn long, so I decided I should stop watching game streams. However, I just felt tired and amenable to bending the rules at approximately the same rate as before, so my behavior didn’t really change.
  • I’m a night owl, to the extent that going to sleep between midnight and 3 AM probably covers 95% of my sleeping times, and the rest is heavily skewed towards after 3 AM. So I started tracking when I went to sleep, and had some friends apply social demerits when I went to sleep late. I got mildly better, but was still all over the place with my sleep schedule.

There’s a happy-ish ending for these habits, but first…

FAILURE at meeting goals

A year ago, I decided to have some resolutions. However, I didn’t want them to be year-long resolutions: a year is a long fucking time, and I knew I’d be pretty susceptible to…

  • falling off the wagon and then not getting back on, burning the rest of the year, or
  • mis-estimating how big a year-sized task would be, which would probably only become apparent near the middle of the year. If I got it really wrong, it would be months before I could properly try again.

So similarly to my newsletter tiers, I decided to break the year into fifths (quinters?), and resolved to do something for each of those. I hoped it would be long enough to actually get something done, while being short enough that I could iterate quickly.

So, how did I do?

Quinter 1

  • FAILURE. Finish the hardware part of project noisEE. Design turned out to be hard, did a design Hail Mary that required parts that didn’t get here before the end of the quinter.
  • Stretch FAILURE. Read all of Jaynes’ Probability Theory. Got only ~40% of the way through: it turns out trying to replicate all the proofs in a textbook is pretty hard.
  • FAILURE. Try to do more city exploratory activities. Planning and executing fun/interesting activities was more time consuming than anticipated, and I didn’t account for how much homebody inertia I harbored and how time consuming the other goals would be.
  • SUCCESS. Keep up all pre-existing habits.

Quinter 2

  • FAILURE. Finish project noisEE. It turns out the Hail Mary design was broken, who could have guessed?
  • SUCCESS (mostly). Make a NAS (network attached storage) box. Technically, the wrap up happened the day after the quinter ended.
  • SUCCESS. Keep up all pre-existing habits. Apparently attaining this goal isn’t a problem, so I stopped keeping track of this in future quinters.

Quinter 3

  • SUCCESS/FAILURE. Finish project noisEE, or know what went wrong while finishing. There was a problem with the 2nd Hail Mary, which I debugged and figured out, but it was expensive to fix, so I didn’t stretch to actually fix it. However, the next quinter I didn’t respect the timebox, which was the entire point of this timebox[1].
  • FAILURE. Make a feedback widget for meetups. After designing it, I discovered I didn’t want to spend the money to fabricate the feasible “worse is better” solution.
  • SUCCESS. Spend 20 hours on learning how to Live Forever. Spent 30+ hours on research.

Quinter 4

It’s about this time that I start enforcing goal ordering: instead of doing the easiest/most fun thing first, I would try to finish goals in order, so large and time consuming tasks don’t get pushed to the end of the quinter.

  • SUCCESS. Finish ingesting in-progress Live Forever research. Just wanted to make sure momentum continued from the previous quinter so I would actually finish covering all the main points I discovered I wanted to include.
  • SUCCESS (sad). Fix project noisEE, or give up after 4 hours. I gave up after 4 hours, after trying out some hacks.
  • SUCCESS. Write up noisEE project notes. Surprisingly, I had things to say despite not actually finishing the project, making the notes into a mistakes post.
  • FAILURE. Write up feedback widget design for others to follow. For some reason, I ignored my reluctance to actually build the thing and assumed I would value writing out potentially building the thing instead. Talk about a total loss of motivation.
  • SUCCESS. Write up the Live Forever research results, post about them. Includes practicing presenting the results a number of times.
  • Stretch FAILURE. Prep the meta-analysis checklist. Didn’t have time or the necessary knowledge.

Quinter 5

At this point, I’m starting to feel stretched out, so I started building in break times into my goal structure.

  • SUCCESS. Prepare to present the Live Forever research. Was probably too conservative here, I also planned to actually present, and there weren’t foreseeable things that would have prevented it from happening.
  • FAILURE. Take a week off project/goal work. I thought I would have only 1 week to prepare to present, but it turned into 2-3 weeks and broke up this break week, which was not nearly as satisfying.
  • SUCCESS. Redesign the U2F Zero to be more hobbyist friendly[2].
  • SUCCESS. Do regular Cloud™ backups[3][4].
  • SUCCESS. Take 1 week off at the end of the year. That’s when I’m writing this post!

Miscellaneous FAILURE

There’s so much FAILURE, I need a miscellaneous category.

Speaking of categories, I was organizing a category theory reading group for the first third of 2017 based on Bartosz’s lectures, but eventually the abstractions on abstractions got to be too much[5] and everything else in life piled on, and we ended up doing only sporadic meetups where we sat around confused before I decided to kill the project. In the end, we FAILED to reach functional programming enlightenment.

I’ve even started to FAIL at digesting lactose. It’s super sad, because I love cheese.


Why was there so much FAILURE this year?

Part of it is that I had more things to FAIL at. For example, I wouldn’t previously keep track of how I was doing at my habits, and color code them so I could just look at my tracker and say “huh, there’s more red than usual”. Or, I wouldn’t previously have the data to say “huh, I went to sleep after 3AM 2 weeks in a row”[6].

And in a way, I eventually succeeded: for each of the habits I listed earlier, I applied the club of Beeminder and hit myself until I started Doing The Thing. Does my reliance on an extrinsic tool like Beeminder constitute a moral failing? Maybe, but the end results are exactly what I wanted:

  • I got super motivated to build up a safety buffer to get into work early (even before getting my sleep schedule together!),
  • only broke Twitch abstinence twice since starting in May[7],
  • immediately went from an average sleeping time of 2AM to almost exactly 12:29[8].

And for goals, I opened myself up to FAILURE by actually making fine-grained goals, which meant estimating what I could do, and tracking whether I actually did them. In a way, there are two ways to FAIL: I could overestimate my abilities, or I could simply make mistakes and FAIL to finish what I otherwise would have been able to do. In practice, it seems like I tended to FAIL by overestimating myself.

It’s pretty obvious in retrospect: I started out by FAILING at everything, and then started cutting down my expectations and biting off smaller and smaller chunks until I actually hit my goals. Maybe I should have built up instead of cutting down, but I wanted to feel badass, and apparently the only way you can do that is by jumping in the deep end, so FAILING over and over it is. On the other hand, I think I just got lucky that I stuck it out until I got it together and started hitting my targets, so if you can do it by building upwards, that might work better.

Takeaways

So going forward what are the things I’d keep in mind when trying to hit goals?

  • Think through more details when planning. Saying “I will do all the proofs in Probability Theory” is fine and good, but there’s only so much time, and if you haven’t worked even one of the proofs, then it’s not a goal, it’s a hope and a prayer. Get some Fermi estimates in there, think about how long things will take and what could go wrong (looking at you, hardware turn-around times[9]).
  • If you’ve never done a similar thing before, then estimating the effort to hit a certain goal is going to be wildly uncertain. Pare the goal way down, because there are probably failure modes you’re not even aware of. For example, “lose 5 pounds” would be a good goal for me, because I’ve fiddled with the relevant knobs before and have an idea about what works. “Make a coat from scratch” is a black box to me, hence not a good goal. Instead, I might instead aim for “find all the tough parts of making a coat from scratch”, which is more specific, more amenable to different approaches, and doesn’t set up the expectation of some end product that is actually usable[10].
  • Relatedly, 10 weeks (about the length of a quinter) is not a leisurely long time. Things need to be small enough to actually fit, preferably small enough to do in a sprint near the end of the quinter. I know crunch time is a bad habit carried over from my academic years, but old habits die hard, and at least the things get done.
  • Build in some rest. I pulled some ludicrous hours in the beginning of the year, and noticed as time went on that I seemed less able to put in a solid 16 hours of math-working on the weekends. My current best guess is that I haven’t been taking off enough time from trying to Do The Thing, so I’m building in some break times.
  • Don’t throw away time. You’ll notice that I kept the noisEE dream alive for 4 quinters, each time trying tweaks and hacks to make it work. It’s clear now that this is a classic example of the sunk cost fallacy, and that I either should have spent more time at the beginning doing it right, or just letting it go at that point.

    Another way to throw away time is to try and do things you don’t want to do. My example is trying to make/post the feedback widget, which is pretty simple, but I discovered I couldn’t give any shits about it after the design phase. This isn’t great, because I said I wanted to do the thing, and not doing the thing means you’re breaking the habit of doing the things you’ve set out to do (from Superhuman by Habit). Unfortunately, I’m still not sure how to distinguish when you really want to do something versus when an easily overridden part of yourself thinks it’s virtuous to want to do something, which is much less motivating.

  • Goal hacks might be useful. Looking at it, the main hack I used was timeboxes, which worked sometimes (total longevity research was within a 2x order of magnitude of my timebox estimate) and not so well in others (noisEE overflowed). It seems to be most useful when I’m uncertain how much actual work needs to be done to achieve some goal, but I still want to make sure work happens on it. After working on it for some number of hours, it should be clearer how sizable the task is and it can get a more concrete milestone in the next round.

    Stretch goals might also work, but making things stretch goals seems like a symptom of unwanted uncertainty, and tend to be sized such that they never actually get hit. Unless I find myself stagnating, I plan on just dropping stretch goals as a tool.

  • If you’re not doing the thing because of something low-level like procrastination, a bigger stick to beat yourself with might help. Beeminder is my stick of choice, with the caveat that you need to be able to be honest with yourself, and excessive failure might just make you sad, instead of productive.

    (As a counterpoint, you might be interested in non-coercive ways to motivate yourself, in which case you might check out Malcolm Ocean’s blog.)


Despite all the FAILURE, I think agree with the sentiment of Ray’s post: over the past few years, I’ve started getting my shit together, building the ability to do things that are more complicated than a single-weekend project and the agency to pursue them.

That said, most of the things I finished this year are somewhat ancillary, laying the groundwork for future projects and figuring out what systems work for me. Now that I’ve finished a year testing those systems and have some experience using them, maybe next year I can go faster, better, stronger. Not harder, though, that’s how you burn out.

Well, here’s to 2018: maybe the stage I set this year will have a proper play in the next.


[1]  Thinking about it, timeboxes fall into two uses. You either want to make a daunting task more tractable, so you commit to only doing a small timebox, and if you want to keep going then that’s great! However, the other timebox is used to make sure that some task that would otherwise grow without bound stays bounded. I intended for the noisEE timebox to be used in the bound fashion, so when I kept deciding to keep working on it, that meant the timebox was broken.

[2]  This project does not have a post yet, and may never have one. Hold your horses.

[3]  Offsite backups are an important part of your digital hygiene, and the Butt is the perfect place to put your them.

[4]  If people really want it, I can post about my backup set up.

[5]  Don’t worry, it’s easy, an arrow is like a functor is like an abstract transformation!

[6]  Knowledge is power, France is bacon.

[7]  One of these wasn’t Twitch at all, but a gaming stream I accidentally stumbled across on YouTube, but that still counts.

[8]  HMM I WONDER WHEN I SET MY SLEEP DEADLINE.

[9]  Unless you’re willing to pay out the nose, getting boards on a slow boat from China takes a while.

[10]  The tradeoff is that the 2nd goal is more nebulous: how do you know that you’ve found all the tough parts of making a coat? Maybe timeboxes would help in this case.

Ain’t No Calvary Coming

Epistemic status[1]: preaching, basically. An apology, in both senses[2].

I know my mom reads my blog; hi, mom.

Mothers being mothers, I figure I owe her a sit-down answer to why I’m not Christian, and don’t expect to re-become Christian[3]. Now, I don’t expect to convince anyone, but maybe you, dear reader, will simply better understand.


Let’s start at the end.

Let’s start with the agony of hell, and the bliss of heaven. Sure, humans don’t understand infinities, don’t grasp the eye-watering vastness of forever nor the weight of a maximally good/bad time. Nevertheless, young me had an active imagination, so getting people out of the hell column and into the heaven column was obviously the most important thing, which made it surprising that my unbeliever friends were so unconcerned with the whole deal. I supposed that they already had a motivated answer in place: as heathens, they would be wallowing in unrepentant hedonism, and would go to great lengths to make sure they kept seeing a world free of a demanding and righteous God.

I knew the usual way to evangelize, but it depended to a frustrating degree on the person being evangelized to. It seemed unacceptable that some of my friends might go to hell just because their hearts were never in the right place. Well, what if I found a truly universal argument for my truly universal religion? The Lord surely wouldn’t begrudge guidance in my quest to find the unmistakable fingerprints of God (which were everywhere, so the exercise should be a cakewalk), and I would craft a marvelous set of arguments to save everyone.

Early on, I realized that the arguments I found persuasive wouldn’t be persuasive to the people I wanted to reach: if you assumed the Bible was a historical text you would end up saying “no way, Jesus did all these miracles, that’s amazing!”, but what if you didn’t trust the Bible? I would need to step outside of the assumption that God existed, and then see the way back. Was this dangerous to my faith? Well, I would never really leave: I would just be empathetic and step into my friend’s shoes, to better know how to guide them into the light. And you remember the story about walking with Jesus on the beach? There was no way this could go wrong!

Looking back, I see that my thoughts were self-serving. As a product of both faith and science, I wanted to make it clear that religion could meet science on its own terms and win. If the hierarchy of authority didn’t subordinate science to religion, then…?

So I studied apologetics[4], particularly Genesis apologetics. I made myself familiar with the things like young vs. old earth creationism, the tornado-in-a-junk-yard equivocation, attacks I could make on gradual and punctuated equilibrium[5]. I was even dazzled by canopy theory, where a high-altitude aerial ocean wrapped the planet, providing waters for The Flood and allowing really long lifespans by blocking harmful solar radiation[6]. I went on missions, raising money and overcoming my natural reticence to talk to people about the Good Word. I even listened almost solely to Christian rock music.

Now, I don’t doubt I believed: I felt the divine in retreats and mission trips, me and my brothers and sisters in Christ singing as one[7]. I prayed for guidance, hung on the words of holy scripture, found the words for leading a group prayer, and eventually confirmed my faith. As part of my confirmation, I remember being baptized for the 2nd time in high school[8]: a clear, lazy river had cut a gorge into sandstone, and the sunset lit the gorge with a warm glow. Moments before I went under the water, I thought “of course. How could I doubt with such beauty in front of me?”.

But some of these experiences also sowed the seeds of doubt. Someone asked if I wanted the blessing of tongues: I said yes, thinking a divine gift of speaking more than halting Spanish would be great for my upcoming mission trip. And, how cool would it be to have a real world miracle happen right in front of me‽ Later I tried to figure out if glossolalia was in fact the tongue of angels[9], but I didn’t come up with anything certain, which was worrying. Why were my local leaders enthusiastic about this “gift of tongues”[10], but other religious authorities were against the practice? On a mission trip I told someone I could stay on missions indefinitely (in classic high school fashion, I had read the word “indefinite” a few times and thought it sounded cool) and was brought up short when they responded with skepticism that someone could stay forever; why wouldn’t they stay if the work was righteous, comfortable living be damned? Or I would think about going to seminary instead of college, and wonder if that was God’s plan for me.

How did I know what was right, what was true?

The thing is that I didn’t even begin to know. On my quest for answers, I didn’t comprehend the sheer magnitude of 2000 years of religious commentary[11]. I didn’t grasp how hairy the family tree of Christian sects was, each with their own tweaks on salvation. I read Mere Christianity and a few books on apologetics, and thought it would be enough. I didn’t even understand my enemy at all, refusing to grapple with something so basically wrong as The Selfish Gene. Into this void on my map of knowledge I sailed a theological Columbus, expecting dragons where there was a whole continent of thought.

So the more I learned, the more doubt compounded. When my church split, I wondered why such a thing could happen: were some of the people simply wrong about a theological question? That raised more disturbing questions about how one could choose the truest sect of Protestant Christianity, ignoring “cults” like Mormonism or Catholicism or Eastern Orthodox or even other religions entirely, like Islam (and there are non-Abrahamic religions, too‽). Or, maybe a church split could happen for purely practical concerns, but it was disturbing that such an important event in a theological institution wasn’t grounded in theological conflict: if not a church split, then what should be determined by theology?[12] And, I realized other religions had followers with similarly intense experiences: what set mine apart from theirs?

Again, what did I know, and how did I know it?

Don’t worry, my spiritual leaders would say. God(ot) is coming, just wait here by this tree and he’ll be along any moment now[13].

And maybe God would come, but he would maintain plausible deniability, an undercover agent in his own church. Faith healings wouldn’t do something so visible as give back an arm, just chase away the back pain of a youth leader for a while. My church yelled prayers over a girl with a genetic defect, and the only outcome was frightening her[14]. Demonic possession leading to supernatural acts isn’t a recorded phenomenon, despite the proliferation of cameras everywhere. So the whispers of godhood would always scurry behind the veil of faith whenever a light of inquiry shone on it.

I started refusing to stand during praise. Singing with this pit of questions in my stomach seemed too much like betrayal, displaying to the world smiles and melodies I knew were empty. I sat and thought instead, trying to retrace Kant’s Critique of Pure Reason without Kant’s talent[15]. I simply couldn’t accept the dearth of convincing evidence and simply trust, when all my instincts and training screamed for a sure foundation, when I knew a cosmic math teacher would circle my answer of “yes, God exists!” and scribble in red “please show your work“.

I told myself I would end it in a blaze of glory, pledging fealty to a worthy Lord, or flinging obscenities at the sky and pulpit when they didn’t have the answers. Instead, my search for god outside of god himself petered out under a pile of unanswered questions[16], and I languished in a purgatory of uncertainty. In a way, I was mourning the death of god. It took years, but now I confidently say I’m an atheist.


So that was the past. What about the future?

Sometimes the prodigal son falls on hard times and has to come home; in the case of the church, home has a number of benefits. Peace of mind that everything will turn out okay. A sabbath, if one decides to keep it. A set of meditation-like practices at regular intervals (even in Christianity!). A set of high-trust social circles[17] with capped vitriol (in theory; in practice, see the Protestant Reformation and aforementioned church splits), a supportive community with a professional leader, a time to all feel together. Higher levels of conscientiousnessHigher productivity[18]. The ability to attract additional votes in Congressional races. Chips at the table of Pascal’s Wager[19].

Perhaps most importantly, though, is a sense of hope. How does one have hope for the future when there is only annihilation at the end?

Paul saw the end, a world descending into decadence, a world that couldn’t save itself: hell, given a map, it wouldn’t save itself. Contrary to this apocalyptic vision, scientism[20]/liberalism preaches abundance, the continual development of an ever better world. We took the limits of man and sundered them; we walked on the moon, we eradicated polio, we tricked rocks into thinking for us, and we’ll break more limits before we’re done. Paul was the product of an endless cycle of empires; we’re on a trajectory to leave the solar system[21].

There is light in the world[22], and it is us.

But if the world is simply getting better, then does it matter what I believe? Well, our rise is only part of the story: it took tremendous work to get from where we were to where we are, and the current world is built on the blood of our mistakes[23]. The double-edged sword of technology could easily lop off our hand if we’re not careful. We’ve done some terrible things already, and finding the Great Leap Forward-scale mistakes with our face is hideously expensive.

So progress is possible, but we haven’t won. How do the engineers say it? “Hope is not a strategy.” There ain’t no Calvary coming[24], ain’t no Good King to save us, ain’t no cosmic liquidation of the global consciousness, ain’t no millennium expiration date on suffering. A reductionist scientific world is a cold world without guardrails, with nothing to stop us from destroying ourselves[25]: if we want a happy ending, we’ll need to breach Heaven ourselves, and bowing our heads and closing our eyes in prayer won’t help when we should be watching the road ahead. It’s going to be a lot of hard work, but this isn’t a cause for despair. This is a call to arms.

So in the past, a successful prodigal son may have gone home for a sense of continuity and purpose, a sense of hope beyond the grave. However, now he doesn’t have to. It’s not just about unrepentant hedonism[26]: we’re getting closer to audacious goals like ending poverty, ending aging, ending death. We won’t wait for a bright new afterlife that isn’t coming: we humanists will do our best, and maybe, just maybe, it will be enough.

No heaven above, no hell below, just us. Let us begin.


[1]  Epistemics: the ability to know things. Epistemic status: how confident I am about the thing I am writing about.

[2]  Senses: saying sorry, and in the sense of apologetics or defending a position. Commonly found as the bi-gram “Christian apologetics”.

[3]  I almost didn’t publish this post, figuring I hadn’t heard from my mom about faith-related topics in a while. Then my mom told all my relatives “We are praying for a godly young woman who can bring <thenoviceoof> back to us”, so here we are.

[4]  A defense of the faith, basically, usually hanging around as a bi-gram like “Christian apologetics”. See Wikipedia.

[5]  Standing from where I am, I can see how the books would paint the strengths of science as weakness: “look at how science has been wrong! And then it changed it’s mind, like a shifty con-man!” In this respect, the flip-flopping nature of science journalism in fields like nutrition is Not Helping, a way of poisoning the well of confident proclamations of evidence, such that everyone defaults to throwing up their hands in the face of evidence, instead of actually assessing it.

[6]  In retrospect, I had a thing for weirdly implausible theories: I remember being smitten with the idea that all of physics could be explained by continually expanding subatomic particles, a sort of classical Theory of Everything that no one asked for, with at least one gaping hole you could drive trucks through (hint: how do satellites work?).

[7]  We even cautioned ourselves against “spiritual highs”. We would feel something, but the something wouldn’t always be there, which maybe should have tipped me off about something fishy happening. How do they say it, “don’t get high off your own supply”?

[8]  Many children are baptized soon after birth, and confirmed at some later age when they can actually make decisions. Hmm.

[9]  Now, I know that I could tell by listening for European capitals.

[10]  I didn’t actually get to the point of spewing glossolalia: I could hear my youth group leader’s disappointment that I didn’t quite let myself go while repeating “Jesus, I love you” faster than I could speak. And, finding out that no earthly audience would have understood what I was saying was also a shock, like finding out God solely communicated to people through grilled cheeses.

[11]  Talk about being bad at grasping infinities: I couldn’t even grasp 2000 years. “More things than are dreamt of in your philosophy”, etc.

[12]  The obvious rejoinder is that the church is still an earthly institution, and it’s still subject to mundane concerns like balancing the budget: for every Protestant Reformation grounded in theological conflict, there’s another hundred grounded in conflicts over the size of the choir, all because we live in a fallen world. The general counter-principle is that if there’s no way to tell from the behavior of churches whether we’re in a godly or godless world, then the fact there exists a church ceases to count as evidence.

[13]  The fact that some biblical scholars translate “cross” as “tree” makes me suspicious that Waiting for Godot was in fact making this exact reference.

[14]  I didn’t partake; this was after I started being weirded out by the charismatics.

[15]  I’m disappointed I didn’t throw up my hands at some point and yell “I Kant do it!”

[16]  Sure, there were answers, but they weren’t satisfying. You couldn’t get there from here.

[17]  Of course, the trust comes at a price; I wouldn’t want to be trans in a small tight-knit fundamentalist town.

[18]  It’s not clear from the abstract of the paper, but in Age of Em Robin Hanson cites this paper as showing the religious have higher productivity.

[19]  Mostly not serious, since I would expect a jealous Abrahamic God to throw out any spiritual bookies. Also keep in mind that Pascal’s wager falls apart even with the simple addition of multiple gods competing for faith.

[20]  I am totally aware that scientism is normally derogatory. However, science itself doesn’t require the modes of thought that we normally attribute to our current scientific culture.

[21]  One might worry that we would simply export our age-old conflicts and flaws to the stars, in which case they might become… bear with me… the Sins of a Solar Empire?

[22]  “Run for the mountains!” said Apostle Paul. “It is the dawn of the morning Son!” Then Oppenheimer said “someone said they were looking for a dawn?”

[23]  Sapiens notes “Haber won the Nobel prize in chemistry. Not Peace.”

[24]  I’m sorry-not-sorry about the pun. If you don’t get it, Calvary is the hill Jesus supposedly died on, and “ain’t no cavalry coming” is a military saying: there’s no backup riding in to save the day.

[25]  Nukes are traditional, if less concerning these days. Pandemics are flirting on the edge of global consciousness, AI getting more serious, and meta-things like throwing away our values and producing a “Disneyland without children” are becoming more concerning.

[26]  Just look at what the effective altruists are doing with their 10%.

The Mundane Science of Living Forever


Epistemic Status: timeboxed research, treat as a stepping stone to more comprehensive beliefs. Known uncertainty called out.

Live forever, or die trying!

Previously: Lifestyle interventions to increase longevity @ LessWrong9s of cats.

TLDR?

Yes, Immortality

I wrestled with whether to shoot for a more normal and mundane title, like “In Pursuit of longevity”, but “live a long time!” just doesn’t have the ring that “live forever!” does.

Clarification: I don’t have the Fountain of Youth. I’m relying on the future to do the heavy lifting. Kurzweil’s escape velocity idea is the key idea: we want to live long enough that life expectancy starts increasing more than 1 year per year. Life expectancy is currently stagnant, so we want to live as long as possible to maximize our chances of hitting some sort of transition.

In other words, we need silver bullets to overcome the Gompertz curve, but there are no silver bullets yet, just boring old lead bullets. We’ll have to make our own silver bullet factory, and use the lead bullets to get there.

So, the bulk of this post will be devoted to simply living healthily. A lot of the advice is boring and standard: eat your vegetables, exercise, get enough sleep. However, I wanted to check out the science and see what holds up under (admittedly amateur) scrutiny.

(I’ll be ignoring the painfully obvious things, like not smoking. If you’re smoking, stop smoking[1].)

My process: I timeboxed myself to 20 hours of research, ending in August 2017. First, I looked up the common causes of death and free-form generated possible interventions. Then, I followed the citations in the Lifestyle interventions to increase longevity post and then searched Google Scholar, especially for meta-analyses, and read the studies, evaluating them in a non-rigorous way. I discarded interventions that I wasn’t certain about: for example, Sarah lists some promising drugs and gene therapies but based only on animal studies, where I wanted more certainty. I ended up using 30+ hours, so not everything is exhaustively researched as much as I would like: for example, there was a fair amount of abstract skimming. I did not read every paper I reference end-to-end. On the other hand, many papers were also locked behind paywalls so I couldn’t do much more than that.

This means if you read one of these results and implement it without talking to your doctor about it and bad things happen to you, I will ask you: ARE YOU A SPRING LAMB? WHY THE FUCK ARE YOU DOING THINGS A RANDOM PERSON ON THE INTERNET TOLD YOU TO DO? AND WITHOUT VETTING THOSE THINGS?

Or more concretely: you are a unique butterfly, and no one cares except the medical world. What happens for the faceless statistical masses might not happen for you. I will not cover every single possible interaction and caveat, because that is what those huge medical diagnosis books are for, and I don’t have the knowledge to tell you about the contents of those books. Don’t hurt yourself, ask your doctor.

An example: blood donation

First, I wanted to lead with an example of how the wrong methods can cripple a conclusion and end up with bad results.

Now, blood donation looks like it is very, very good for male health outcomes. From “Blood donation and blood donor mortality after adjustment for a healthy donor effect.” with 1,182,495 participants (N=1,182,495) published in 2015 (note it’s just an abstract, but the abstract has the data we want):

» For each additional annual blood donation, the all-cause mortality RR (relative risk) is 0.925, with a 95% CI (confidence interval) from 0.906 to 0.943[2]. I’ll be summarizing this information as RR = 0.925[0.906, 0.943] throughout the post.

(Unless otherwise stated, in this post an RR measure will refer to all-cause mortality, and X[Y, Z] CI reports will be values followed by 95% confidence intervals. There will also be references to OR (odds ratio) and HR (hazard ratio)).

There’s even a well fleshed out mechanism, where iron ends up oxidizing parts of the cardiovascular system and damaging it, and hence doing regular blood donation removes excess blood iron.

But there are some possible confounders:

  • blood donation carries some of the most stringent health screens most people face, which results in a healthy donor effect,
  • altruism could be correlated with conscientiousness, which might affect health outcomes.

The study cited earlier is observational: they’re looking at existing data gathered in the course of normal donation and studying it to see if there’s an effect. In order to make a blanket recommendation that men should donate blood at some regular interval, what we really want is to isolate the effect of donation by putting people through the normal intake and screening process, and then right before putting the needle in randomize for actually taking the donation or not, or even stick the needle in and not actually draw blood.

(Note that randomization is not strictly better than observational studies: observations can provide insights that randomization would miss[3], and a rigorous RCT might not match real world implementations. Nevertheless, most of the time I want a randomized trial.)

No one had done an RCT (randomized controlled trial) in this fashion, and I expect any such study to have a really hard time passing an ethics board when I get numerous calls to help alleviate emergency blood need at a number of times throughout the year.

However, Quebec noticed that their screening procedures were too strict: a large group of people were being rejected when they were in fact perfectly healthy. The rejection trigger didn’t appear to otherwise correlate with health, so this was about as good a randomized experiment as we were going to get. Their results were reported in “Iron and cardiac ischemia: a natural, quasi-random experiment comparing eligible with disqualified blood donors” (2013, N=63,246):

» Donors vs nondonors, RR = 1.02[0.92, 1.13]

In other words, there was basically no correlation. In fact, in another section of the paper the authors could get the correlation to come back by slicing their data in a way that better matched the healthy donor process.

The usual hallmarks of science laypeople can pick apart aren’t there: the N is large, there’s a large cross-section of the community (no elderly Hispanic women effect) and there’s no way to even fudge our interpretation of the numbers: we’re not beholden to science’s fetish with p=0.05, so failing the 95% CI could be okay if it were definitely leaning in the right direction. But it’s almost exactly in the middle. The effect isn’t there or is so tiny that it’s not worth considering.


So that’s an example of how things can look like great interventions, and then turn out to have basically no effect. With our skeptic hats firmly in place, let’s dive into the rest!

Easy, Effective

Vitamin D

Vitamin D gets the stamp of approval from both Cochrane and Gwern[4]. Lots of big randomized studies have been done with vitamin D supplementation, so the effect size is pretty pinned down.

From “Vitamin D supplementation for prevention of mortality in adults” (2012, N=95,286, Cochrane):

» Supplementation with vitamin D vs none, RR = 0.94[0.91, 0.98]

Another meta-analysis used by Gwern, “Vitamin D with calcium reduces mortality: patient level pooled analysis of 70,528 patients from eight major vitamin D trials” (2012, N=70,528):

» Supplementation with vitamin D vs none, HR = 0.93[0.88, 0.99]

You might think that one side of the CI is pretty bad, since RR = 0.98 means the intervention is almost the same as the control. On the other hand, (1) wait until you read the rest of the post (2) keep in mind that it’s very cheap to supplement vitamin D. Your local drugstore probably has a years worth for $20. In a pinch, more sunlight also works, but if you have darker skin, sunlight is less effective.

If you’re interested, there’s lots of hypothesizing on the mechanisms by which more vitamin D impacts things like cardiovascular health (overview).

(If you want a striking visual example of vitamin D precursors correlating with cancer, there’s a noticable geographic gradient in certain cancer deaths; “An estimate of premature cancer mortality in the U.S. due to inadequate doses of solar ultraviolet-B radiation” (2002) states that some cancers are twice as prevalent in the northern US than the southern. There’s more sun in the south, and sunlight helps synthesize vitamin D. Coincidence?! If you want to, you can see this effect yourself by going to the Cancer Mortality Maps viewer from the National Cancer Institute and taking a look at the bladder, breast, corpus uteri or rectum cancers.)

Difficult, but Effective

Exercise

Exercising is hard work, but it pays off big.

From “Domains of physical activity and all-cause mortality: systematic review and dose–response meta-analysis of cohort studies” (2011, N=unknown subset of 1,338,143[5]):

» Comparing people that get 300 minutes of moderate-vigorous exercise/week vs sedentary populations, RR = 0.74[0.65, 0.85]

Unfortunately, “moderate-vigorous” is pretty vague, and the number of multiple comparisons being made is breathtaking.

MET-h is a unit of energy expenditure roughly equivalent to sitting and doing nothing for an hour. Converting different exercises (or intensities of exercise) to MET-h measures can allow directly comparing/aggregating different exercise data. This also makes it easier to decide exactly what “moderate-vigorous” exercise is, roughly mapping to less than 3 MET/h for light, 3-6 for moderate, and above 6 for vigorous.

With this in mind, we can get a regression seeing how additional MET-hs impact RR. From the previous study (2011, N=unknown subset of 844,026):

» +4 MET-h/day, RR = 0.90[0.87, 0.92] (roughly mapping to 1h of moderate exercise)

» +7 MET-h/day, RR = 0.83[0.79, 0.87] (roughly mapping to 1h vigorous exercise)

There’s a limit, though: exercising for too long, or too hard, will eventually stop providing returns. The same study places the upper limit at around a maximum RR = 0.65 when comparing the highest and lowest activity levels. The Mayo Clinic in “Exercising for Health and Longevity vs Peak Performance: Different Regimens for Different Goals” recommends capping vigorous exercise at 5 hours/week for longevity.

A quick rule of thumb is that each hour of exercise can return 7x time dividends (news article). This sounds great, but do some math: put this return together with the 5 hours/week limit, assume that you’re 20yo and doing the maximum exercise you can until 60, and this works out to adding roughly 8 years to your life (note that the study the rule of thumb is based on (2012) gives a slightly lower average maximum gain, around 7 years). Remember the Gompertz curve? We can huff and puff to get great RRs, and it only helps a bit. Unfortunate.

(While we’re exercising: keep in mind that losing weight isn’t always good: if you’re already at a health weight and start losing weight without intending to, that could be a sign that you’re sick and don’t know it yet (source).)

Other studies I looked at:

Unfortunately, most of these studies are based on surveys, which have the usual problems with self reports. There are some studies based on measuring VO2max more rigorously as a proxy for fitness, except those have tiny Ns, in the tens if they’re lucky (it’s expensive to measure VO2max!).

Diet

Overall, many of these studies are observational and based on self-reports; a few are based on randomized provided food, but the economics dictate they have smaller Ns. I’ve put all the diet-related things together, since in aggregate they are fairly impactful (if difficult to put into practice), but note that some of the subheadings contain less certain results.

Fruit and vegetables

It’s like your childhood authority figures said: eat your vegetables.

From “Fruit and vegetable consumption and mortality from all causes, cardiovascular disease, and cancer: systematic review and dose-response meta-analysis of prospective cohort studies” (2014, N=833,234):

» +1 serving fruit or vegetable/day, HR = 0.95[0.92, 0.98]

Like exercise, fruits/vegetables don’t stack forever either; there’s around a 5 serving/day limit after which effects level off. Still, that adds up to around HR = 0.75, competitive with maximally effective exercise.

Potatoes are a notable exception, having a uniquely high glycemic load among vegetables; this roughly means that your blood sugar will spike after eating potatoes, which seems bad. You can find plenty of debate about whether this is in fact bad[6].

Other reports I looked at:

Red/Processed Meat

You know bacon is bad for you, but… bacon is pretty bad for you.

From “Red Meat and Processed Meat Consumption and All-Cause Mortality: A Meta-Analysis” (2013, N=unknown subset of 1,330,352) effects from both plain red meat (hamburger, steak) and processed red meat (dried, smoked, bacon):

» Highest vs lowest consumption categories[7] for red meat, RR = 1.10[0.98, 1.22]

» Highest vs lowest consumption categories for processed red meat, RR = 1.23[1.17, 1.28]

There isn’t all-cause data I could find on fried foods specifically, but “Intake of fried meat and risk of cancer: A follow-up study in Finland” specifically covers cancer risks (1994, N=9,990):

» Highest vs lowest tetrile fried meat: RR = 1.77[1.11, 2.84]

Note that the confidence intervals are wide: for example, the red meat CI covers 1.0, which is pretty poor (and yet the best all-cause data I could find). If we were strictly following NHST (null hypothesis significance testing), we’d reject this conclusion. However, I’ll begrudgingly accept waggled eyebrows and “trending towards significance”[8].

If you’re paleo, you might not have cause to worry, since you’re probably eating better than most other red meat eaters, but I have no data for your specific situation.

Other reports I looked at:

Fish (+Fish oil)

Fish is pretty good for you! Fish oil might contribute to fish “consumption”.

“Risks and benefits of omega 3 fats for mortality, cardiovascular disease, and cancer: systematic review” (2006, N=unknown subset of 36,913) looked at both fish consumption and fish oil, finding that fish/fish oil weren’t significantly different:

» High omega-3 (both advice to eat more fish, and supplementation) vs low, RR = 0.87[0.73, 1.03]

Note this analysis only included RCTs.

“Association Between Omega-3 Fatty Acid Supplementation and Risk of Major Cardiovascular Disease Events: A Systematic Review and Meta-analysis” (2012, N=68,680) looked only at fish oil supplementation:

» Omega-3 supplementation vs none, RR = 0.96[0.91, 1.02]

Note that both of these results have relatively wide CI covering 1.0. Additionally, the two studies seem to differ on the relative effectiveness of fish oil.

There’s plenty of exposition on mechanisms for why fish oil (omega-3 oil) might help in the AHA scientific statement “Fish Consumption, Fish Oil, Omega-3 Fatty Acids, and Cardiovascular Disease”.

Also make sure that you’re not eating mercury laden fish while you’re at it; just because Newton did it doesn’t mean you should.

Other studies I looked at:

Nuts

This study of 7th Day Adventists by “Nut consumption, vegetarian diets, ischemic heart disease risk, and all-cause mortality: evidence from epidemiologic studies” points in the right direction (1999, N=34,198):

» Eating nuts <1 time/week vs >=5 times/week, fatal heart attack RR ~ 0.5[0.32, 0.75] (estimated from a graph)

However, I don’t trust it. Look at how implausibly low that RR is: eating nuts is better than getting the maximum benefit from exercise? How in the world would that work? Unfortunately, I wasn’t able to find any studies that weren’t confounded by religion, so I just have to stay uncertain for now.

Sleep

We spend a third of our lives asleep, of course it matters. The easiest thing to measure about sleep is the length, so plenty of studies have been done on that. You want to hit a Goldilocks zone of sleep length, not too short or not too long. The literature calls this the aptly named U-shape response.

What’s too short, or too long? It’s frustrating, because one study’s “too long” can be another study’s “too short”, and vice versa.

However, from “Sleep Duration and All-Cause Mortality: A Systematic Review and Meta-Analysis of Prospective Studies” (2010, N=1,382,999):

» Too short (<4-7h), RR = 1.12[1.06, 1.18]

» Too long (>8-12), RR = 1.30[1.22, 1.38]

And from “Sleep duration and mortality: a systematic review and meta-analysis” (2009, N=unknown):

» Too short (<7h), RR = 1.10[1.06, 1.15]

» Too long (>9h), RR = 1.23[1.17, 1.30]

So there’s range right around 8 hours that most studies can agree is good.

You might be fine outside of the Goldilocks zone, but if you haven’t made special efforts to get into the zone, you might want to try and get into that 7-9h zone the studies can generally agree on.

Again, most of these studies are survey based. I can’t find the source, but a possible unique confounder is that sleeping unusually long might be a dependent, not independent variable: if you’re sick but don’t know it, one symptom could manifest as sleeping more.

And, if you get enough sleep but feel groggy, you might want to get checked out for sleep apnea.

Other studies I looked at:

Less Effective

Flossing

The original longevity guide was enthusiastic about flossing. Looking at “Dental Health Behaviors, Dentition, and Mortality in the Elderly: The Leisure World Cohort Study” (2011, N=690), it’s hard not to be:

» Among daily brushers, never vs everyday flossers, HR = 1.25[1.06, 1.48]

Even more exciting is the dental visit results (N=861):

» Dental exam twice/year vs none, HR = 1.48[1.23, 1.79]

However, the study primarily covers the elderly with an average age of 81yo. Sure, one hopes that the effects are universal, but the non-representative population makes it hard to do so. So while flossing looks good, I’m not ready to trust one study, especially when I can’t find a reasonable meta-analysis covering more than a few hundred people.

As a counterpoint, Cochrane looked at flossing specifically in “Flossing to reduce gum disease and tooth decay” (2011, N=1083), finding that there’s weak evidence for reduction in plaque, but basically nothing else.

I’ll keep flossing, but I’m not confident about the impact of doing so.

Other studies I looked at:

Sitting

Sitting down all day might-maybe-possibly be bad for health outcomes.

There are some studies trying to measure the impact of sitting length. From “Daily Sitting Time and All-Cause Mortality: A Meta-Analysis” (2013, N=595,086):

» +1 hour sitting with >7 hours/day sitting, HR = 1.05[1.02, 1.08]

However, the aptly named “Does physical activity attenuate, or even eliminate, the detrimental association of sitting time with mortality? A harmonised meta-analysis of data from more than 1 million men and women” (2016, N=1,005,791, no full text) claims the correlation only holds at low levels of activity: once people start getting close to the exercise limit, this study found the correlation between sitting and all-cause mortality disappeared.

From “Leisure Time Spent Sitting in Relation to Total Mortality in a Prospective Cohort of US Adults” (2010, N=53,440):

» Sitting >6 hours vs <3 hours/day (leisure time), RR 1.17[1.11, 1.24]

Note that this is the effect for men: the effect for women is larger. Also, this study directly contradicts the other study, claiming that sitting time has an effect on mortality regardless of activity level. And who in the world sits for less than 3 hours/day during their leisure time? Do they just not have leisure time?

Again, these studies were survey based.

The big unanswered question in my mind is whether exercising vigorously will just wipe the need to not sit. So, I’m not super confident you should get a fancy sit-stand desk.

(However, I do know that writing this post meant so much sitting that my butt started to hurt, so even if it’s not for longevity reasons I’m seriously considering it.)

Other reports I looked at:

Air quality

Air quality has a surprisingly small impact on all-cause mortality.

From “Meta-Analysis of Time-Series Studies of Air Pollution and Mortality: Effects of Gases and Particles and the Influence of Cause of Death, Age, and Season” (2011, N=unknown (but aggregated from 109 studies(?!))):

+31.3 μg/m3 PM10RR = 1.02[1.015, 1.024]

+1.1 ppm CO, RR = 1.017[1.012, 1.022]

+24.0 ppb NO2RR = 1.028[1.021, 1.035]

+31.2 ppb O3 daily max, RR = 1.016[1.011, 1.020]

+9.4 ppb SO2RR = 1.009[1.007, 1.012]

(I’m deriving the RR from percentage change in mortality.)

By itself the RR increments aren’t overwhelming. But since it’s expressed as an increment, if there are 50 increments present in a normal day that we can filter out ourselves, then that adds up to some real impact. The increments aren’t tiny compared to absolute values, though. For example, maximum values in NYC during the 2016 summer:

PM10 ~ 58 μg/m3

CO ~ 1.86 ppm

NO2 ~ 60.1 ppb

O3 ~ 86 ppb

SO2 ~ 7.3 ppb

So the difference between a heavily trafficked metro area and a clean room is maybe twice the percentage impacts we’ve seen, which just doesn’t add up to very much. Beijing is another story, but even then I (baselessly) question the ability of household filtration systems to make a sizable dent in interior air quality.

There are plenty of possible confounders: it seems the way these sorts of studies are run is by looking at city-level pollution and mortality data, and running the regressions on those data points.

Other studies I looked at:

Hospital Choice

Going to the hospital isn’t great: medical professionals do the best they can, but they’re still human and can still screw up. It’s just that the stakes are really high. Like, people recommend marking on yourself which side a pending operation should be done on, to reduce chances of catastrophic error.

Quantitatively, “A New, Evidence-based Estimate of Patient Harms Associated with Hospital Care” (2013) says that 1% of deaths in the hospital are adverse deaths deaths. However, note that many adverse deaths weren’t plausibly preventable by anyone other than Omega.

If you’re having a high stakes operation done, “Operative Mortality and Procedure Volume as Predictors of Subsequent Hospital Performance” (2006) recommends taking into account a hospital’s historical morbidity rate and volume for that procedure: if you’re getting heart surgery, you want to go to the hospital that handles lots of heart surgeries, and has done so successfully in the past.

Other studies I looked at:

Green tea

Unfortunately, there’s no all-cause mortality data on the impact of tea in general, green tea in particular. We might expect it to have an effect through flavonoids.

As a proxy, though, we can look at blood pressure, where lower blood pressure is better. From “Green and black tea for the primary prevention of cardiovascular disease” (2013, N=821):

» Systolic blood pressure, -3.18[-5.25, -1.11] mmHg

» Diastolic blood pressure, -3.42[-4.54, -2.30] mmHg

There’s a smaller effect from black tea, around half the size.

Cochrane also looked at green tea prevention rates for different cancers. From “Green tea for the prevention of cancer” (2009, N=1.6 million), it’s unclear whether there’s any strong evidence of effect for any cancer, in addition to there being a possible garden of forking paths.

If you’re already drinking tea, like me, then switching to green tea is low cost despite any questions about efficacy.

Borderline efficacy

Baby Aspirin

The practice of taking tiny daily doses of aspirin, mainly to combat cardiovascular disease. From “Low-dose aspirin for primary prevention of cardiovascular events in Japanese patients 60 years or older with atherosclerotic risk factors: a randomized clinical trial.” (2014, N=14,464):

» Aspirin vs none, aggregate cardiovascular mortality HR = 0.94[0.77, 1.15]

That CI width is very concerning; you can cut the data so you get subsets of cardiovascular mortality to become significant, like looking at only non-fatal heart attacks, but it’s not like there’s a breath of correcting for multiple comparisons anywhere, and the study was stopped early due to “likely futility”.

The side effects of baby aspirin are also concerning. Internal bleeding is possible (Mayo clinic article), since aspirin is acting as a blood thinner; however, it isn’t too terrible, since it’s only a 0.13% increase in “serious bleeding” that resulted in hospitalization (from “Systematic Review and Meta-analysis of Adverse Events of Low-dose Aspirin and Clopidogrel in Randomized Controlled Trials” (2006)).

More concerning is the stopping effect. “Low-dose aspirin for secondary cardiovascular prevention – cardiovascular risks after its perioperative withdrawal versus bleeding risks with its continuation – review and meta-analysis” looked into cardiovascular risks when stopping a baby aspirin regime before surgery (because of increased internal bleeding risks), and found that a low single-digit percentage of heart attacks happened shortly after aspirin discontinuation. (I’m having trouble interpreting this report.)

I imagine this is why professionals start recommending baby aspirin to folks above 50yo, since the risks of heart attack start to obviously outweigh the costs of taking aspirin constantly. And speaking of cost: baby aspirin is monetarily inexpensive.

Other studies I looked at:

Meal Frequency

Some people recommend eating smaller meals more frequently, particularly to lose weight, which is tied to health outcomes.

From “Effects of meal frequency on weight loss and body composition: a meta-analysis” (2015, N=unknown):

» +1 meal/day, -0.27 ± 0.11 kg of fat mass

It’s not really an overwhelming result; taking into account the logistical overhead of planning out extra meals in a society based on 3 square meals a day, is it really worth it to lose maybe half a kilogram of fat?

Other studies I looked at:

Caloric Restriction

Most longevity folks are really on board the caloric restriction (CR) train. There’s an appealing mechanism where lower metabolic rates produce fewer free radicals to damage cellular machinery, and it’s the exact amount of effort that one might expect from a longevity intervention that actually works.

A common example of CR is the Japanese Ryukyu islands, where there are a surprising number of really old people, who eat a surprisingly low number of calories. However, say it with me: con-found-ed to he-ll! The fact that a single isolated subsection of a single ethnic group have a correlation between CR and longevity doesn’t make me confident that I too can practice CR and tell death to fuck off for a few more years.

So we want studies. Unfortunately, most humans fall into the state of starving and lacking essential nutrients, or having enough calories and nutrients, but almost never the middle ground of having too few calories but all the essential nutrients (2003, literature review). Then there’s the ethics of getting humans to agree to a really long study that controls their diet, so let’s look at animal studies first.

However, different rhesus monkey studies give different answers.

» From “Impact of caloric restriction on health and survival in rhesus monkeys from the NIA study” (2012, N=unknown, no full text), there was no longevity increase from young or old rhesus monkeys.

» However, from “Caloric restriction delays disease onset and mortality in rhesus monkeys” (2009, N=76), there was a 30% reduction in death over 20 years.

Thankfully they’re both randomized, but it doesn’t really help when they end up with conflicting conclusions. You’d hope there would be better support even in animal models for something that should have huge impacts.

What else could we look at? We’re not going to wait for an 80-year human study to finish (the ongoing CALERIE study comes close), so maybe we could look at intermediate markers that are known to have an impact on longevity and go from there.

A CALERIE checkpoint study, “A 2-Year Randomized Controlled Trial of Human Caloric Restriction: Feasibility and Effects on Predictors of Health Span and Longevity” (2015, N=218), looks at the impact of 25% CR on blood pressure:

» Mean blood pressure change, around -3 mmHg (read from a chart)

Pretty good, but that’s also around the impact of green tea. Then, there’s the implied garden of forking paths bringing in multiple comparisons, since the study in the same cluster looks at multiple types of cholesterol and insulin resistance markers.

Finally, there’s the costs: you have to exert plenty of willpower to actually accomplish CR. For something with such large costs, the evidence base just isn’t there.

Chocolate

Chocolate has some impact on blood pressure. “Effect of cocoa on blood pressure” (2017, N=1804, Cochrane) finds that eating chocolate lowers your blood pressure:

Systolic blood pressure, -1.76[-3.09, -0.43] mmHg

Diastolic blood pressure, -1.76[-2.57,-0.94] mmHg

However, if you’re normotensive then there’s no impact on blood pressure, and only taking into account hypertensives the effect jumps to -4 mmHg. Feel free to keep eating your chocolate, but don’t expect miracles.

Social Interaction

Having a social life looks like a really great intervention.

From “Social Relationships and Mortality Risk: A Meta-analytic Review” (2010, N=308,849):

» Weaker vs stronger relationships, OR = 1.50[1.42, 1.59]

And from “Social isolation, loneliness, and all-cause mortality in older men and women” (2013, N=6500):

» Highest vs other quintiles of social isolation, HR = 1.26[1.08, 1.48]

And from “Marital status and mortality in the elderly: A systematic review and meta-analysis” (2007, N>250,000, no full text):

» Married vs all currently non-married, RR = 0.88[0.85, 0.91]

You can propose a causal mechanism off the top of your head: people with more friends are less depressed which just has good health outcomes.

However, the alarm bells should be ringing: is the causal relationship backwards? Are healthier people more prone to socializing? Do the confounders never end? The kicker is that all these studies are looking at the elderly (above 50yo at least), which reduces their general applicability even more.

Other studies I looked at:

Cellphone Usage

Remember when everyone was worried that chronic cellphone usage was going to give us all cancer?

Well “Mobile Phone Use and Risk of Tumors: A Meta-Analysis” (2008, N=37,916) says it actually does:

» Overall tumor, OR = 1.18[1.04, 1.34]

» Malignant tumor, OR = 1.00[0.89, 1.13]

Since we’re worried about malignant tumors, it’s hard to say we should be worried by cellphones.

Other studies I looked at:

Unproven

Confusing thirst with hunger

Some people recommend taking a drink when you feel hungry, the idea being that thirst sometimes manifests as hunger, and you can end up eating fewer calories.

Unfortunately, I couldn’t find any studies that tried to look into this specifically: the closest thing I found was “Hunger and Thirst: Issues in measurement and prediction of eating and drinking” (2010) which reads like a freshman philosophy paper, and “Thirst-drinking, hunger-eating; tight coupling?” (2009, N=50?) which fails to persuade me about… anything, really.

Stress Reduction in a Pill

There are some “natural” plants rumored to have stress reduction effects, Rhodiola rosea and Ashwagandha root.

Meta-analysis on Rhodiola, “The effectiveness and efficacy of Rhodiola rosea L.: A systematic review of randomized clinical trials” (2011, N=unknown) found that Rhodiola had effects on something, but the study was basically a fishing expedition. Even the study name betrays that it doesn’t matter what it’s effective at, just that it’s effective.

Another meta-analysis, “Rhodiola rosea for physical and mental fatigue: a systematic review” (2012, N>176) looked specifically at fatigue and found mixed results.

Meta-analysis on Ashwagandha, “Prospective, Randomized Double-Blind, Placebo-Controlled Study of Safety and Efficacy of a High-Concentration Full-Spectrum Extract of Ashwagandha Root in Reducing Stress and Anxiety in Adults” (2012, N=64) found reductions in self-reported stress scales and cortisol levels (and with RCTs!).

Look, the Ns are tiny, and the studies the meta-analyses are based on are old, and who knows if the Russians were conducting their side of the studies right (Rhodiola originated in Russia, so many of the studies are Russian).

I’m including this because I got excited when I saw it in the original longevity post: stress reduction in a pill! Why do the hard work of meditation when I could just pop some pills (a very American approach, I know)? It just doesn’t look like the evidence base is trustworthy, and my personal experiences confirm that if there’s an effect it’s subtle (Whole Foods carries both Rhodiola and Ashwagandha, so you can try them out for yourself for like $20).

Other studies I looked at:

Water Filters

Unfortunately, there’s basically no research on health effects from water filtration in 1st world countries above and beyond municipal water treatment. Most filtration research is either about how adding any filtration to 3rd world countries has massive benefits, or how bacteria can grow on activated carbon granules. Good to know, but on reflection did we expect bacteria to stop growing wherever it damn well pleases?

So keep your Brita filter, but it’s not like we know for sure whether it’s doing anything either. Probably not worth it to go out of your way to get one.

Hand sanitizer

So I keep hand sanitizer in multiple places in my apartment, but does it do anything?

I only found “Effectiveness of a hospital-wide programme to improve compliance with hand hygiene” (2000, N=unknown), which focused on hospital health outcomes impacted by hand washing adherence. First, not all doctors wash their hands regularly (40% compliance rates in 2011) (scholarly overview), which is worrying. Second, there’s a positive trend between hand washing (including hand sanitizers) and outcomes:

» From moving 48% hand washing adherence to 66%, the hospital-wide infection rate decreased from 16.9% to 9.9%.

However, keep in mind that home and work are usually less adverse environments than a hospital; there are fewer people with compromised immune systems, there are fewer gaping wounds (hopefully). The cited result is probably an upper bound for us non-hospital folk.

(There’s also this cute study: hand sanitizer contains chemicals that make it easier for other chemicals to penetrate the skin, and freshly printed receipts have plenty of BPA on the paper. This means that sanitizing and then handling a receipt will lead to a spike of BPA in your bloodstream. I presume that relative to eating with filthy hands the BPA impact is negligible, but damn it, researchers are doing these cute small scale studies instead of the huge randomized trials I want.)

Other studies I looked at:

Doctor visits

Should you visit your doctor for a annual checkup? My conscientious side says “of course”, but my contrarian side says “of course not”.

Well, “General health checks in adults for reducing morbidity and mortality from disease” (2012, N=182,880, Cochrane) says:

» Annual checkup vs no exam, RR = 0.99[0.95, 1.03]

So basically no impact! Ha, take that, couple hour appointment!

However, The Chicago Tribune notes some mitigating factors, like the main studies the meta-analysis is based on are old, like 1960s old.

Metformin

I didn’t look at metformin in my main study period: I knew it had some interesting results, but it also caused gastrointestinal distress, better known as diarrhea. It brings to mind the old quip: metformin doesn’t make you live longer, it just feels like it[9].

However, while I was reading Tools of TitansDominic D’Agostino floated an intriguing idea: he would titrate the metformin dose from some tiny amount until he started exhibiting GI symptoms, and then dialed it back a touch. I don’t think people have started even doing small scale studies around this, but it might be worth looking into.

Other

There’s some stuff that doesn’t have a cost-benefit calculation attached, but I’m including anyways. Or, there are things that won’t help you, but might help the people around you.

CPR

From “Effectiveness of Bystander-Initiated Cardiac-Only Resuscitation for Patients With Out-of-Hospital Cardiac Arrest” (2007, N=4902 heart attacks):

» Cardiac-only CPR vs no CPR, OR 1.72[1.01, 2.95]

So the odds ratio looks pretty good, except that CI is really wide, and the in absolute terms most people still die from heart attacks: administering CPR raises the chances of survival from 2.5% to 4.3%. So, spending more than a few hours practicing CPR is chasing some really tail risks[10].

However, have two people in your friend group that know CPR, and they can provide a potential buff to everyone around them (two, because you can’t give CPR to yourself). In a similar vein, the Heimlich maneuver might be good to know.

Other studies I looked at:

Smoke Alarm testing

Death by fire is not super common. That said, these days it’s cheap to set up a reminder to check your alarm on some long interval, like 6 months.

Quikclot

It’s unlikely you’ll need to do trauma medicine in the field, but if you’re paranoid about tail risk then quikclot (and competitors) can serve as a buttress against bleeding out. Some folks claim that tourniquets are better, but the trauma bandages are a bit more versatile, since you can’t tourniquet your chest.

It’s not magical: since the entire thing becomes a clot, it’s basically just moving a life threatening wound from the field into a hospital. Also make sure to get the bandage form, not the powder; some people have been blinded when the wind blew the clot precursor into their eyes.

Cryonics

Of course, this post wouldn’t be complete without a nod to cryonics. It’s the ultimate backstop. If there all else fails, there’s one last option to make a Hail Mary throw into the future.

Obviously there are no empirical RR values I can give you: you’ll have to estimate your own probabilities and weigh your own values.

WTF, Science?

The overarching story is that we cannot trust anything, because almost all the studies are observational and everything could be confounded to hell by things outside the short list that every study incants they controlled for and we would have no idea.

Like Gwern says, even the easiest things to randomize, like giving people free beer, aren’t being done, much less on a scale that could give us some real confidence.

There is too little disregard for the garden of forking paths in this post-replication crisis world, and many studies are focused on subgroups that plausibly won’t generalize (ex. the elderly).

And what’s up with the heterogeneity in meta-analyses? If every single analysis results in “these results displayed significant heterogeneity”, then what’s the point? What are we doing wrong?

What am I doing?

Maybe you want to know what me myself am doing; I suspect people would be interested for the same reason journalists intersperse a perfectly good technical thriller with human interest vignettes, so here:

  • Continuing vitamin D supplementation, and getting a couple minutes of sun when I can.
  • Making an effort to eat more vegetables, less bacon/potatoes (to be honest, I’m more optimistic about cutting out the bacon than potatoes), more fish, and replacing more of my snacking with walnuts.
  • Keep taking fish oil.
  • Exercise better: I haven’t upped the intensity of my routine in a while. I probably need some more aerobic work, too.
  • Tell myself I should iron out my sleep schedule.
  • Get myself a standing desk for home: I have a standing desk at work, so I’m already halfway there.
  • Buy an air filter: low impact, but whatever, gimmie my percentage points of RR.
  • Switch from drinking black tea to green tea.
  • Cut back on donating blood. I’ll keep doing it because it’s also wrapped up in “doing good things”, but I was doing it partly selfishly based on the non-quasi-randomized studies. Besides, I have shitty blood.

TLDR

Effective and certain:

  • Supplement vitamin D.

Effective, possibly confounded:

  • Exercise vigorously 5 hours/week.
  • Eat more fruits and vegetables, more fish, less red meat, cut out the bacon.
  • Get 7-9 hours of sleep.

Less effective, less certain:

  • Brush your teeth and floss daily.
  • Try to not sit all day.
  • Regarding air quality, don’t live in Beijing.

There is also a presentation.


[1]  If you need me to go through the science of smoking, then let me know and I can do so: I mostly skipped it because I’m already not smoking, and the direction of my study was partly determined by what could be applicable to me. As a non-smoker, I didn’t even notice it was missing until a late editing pass.

[2]  The abstract reports results in terms of percentage mortality decrease, which I believe maps to the same RR I gave.

[3]  If I remember correctly, Due Diligence talks about this.

[4]  The Cochrane Group does good, rigorous analysis work. Gwern is an independent researcher in my in group, and he seems to be better at this sort of thing than I am.

[5]  Annoyingly, some meta-analyses don’t report the aggregate sample sizes for analyses that only use a subset of the analyzed reports.

[6]  For example, Scott’s review of The Hungry Brain points out that some people think potatoes are great at satiating appetites, so it might in fact work out in favor of being okay.

[7]  These category comparisons are loose, since some studies will report quartiles and others will use tertiles, so the analysis simply goes with the largest effect possible across all studies.

[8]  Yes, it’s fucking stupid I have to stoop to this.

[9]  Originally “marriage doesn’t make you live longer, it just feels like it.”

[10]  I know, it’s ironic that I’m calling this a tail risk, when we’re pushing something as stubborn as the Gompertz curve.

9s of Cats


Epistemic status: value judgement.

The internet has a lot of cat pictures.

Let's say I upload a cat picture to Amazon's Simple Storage Service (S3). As of writing, their marketing materials claim that a stored object is 99.999999999% likely to stay securely stored in a year, which translates into a 50% chance of losing a given cat picture once every 70 billion years years[1]. In storage/networking jargon, this is 11 9's of durability, a sort of fast n' dirty logarithmic shorthand for stating how reliable a service is found by counting the 9s in a percentage. For example, 99.9% would be 3 9's.

This doesn't mean that Amazon is super optimistic and thinks the chance of total nuclear war or perfect storm pandemic is some tiny percentage. It's just that if civilization does collapse then former customers would want Amazon warriors over Amazon refunds. Conditional on the continued existence of Amazon, the business, they'll probably keep doing crazy replication schemes[2] to maintain those crazy guarantees.

However, smaller apocalypses will leave Amazon broken while humanity lives on[3]. In these futures, I could easily imagine children gathered around a working fin de sicle computer wondering why in the world this cat looks so grumpy?

So certain cat photos might in fact have 11 9's of durability, enough to live 9 lives over and over. What about humans?

What about humans? Looking at the 2014 CDC death rates, there are 823.7 deaths/100,000 people, working out to a 99.18% annual durability for a randomly selected human (American), for a measly 2 9s of durability. If you show someone a cat picture when they are 12, at best you can expect them to hold onto that memory with 2 9s of durability, because after that they are likely dead[4].

Cat pictures hold together with 11 9s: humans hold together with 2 9s.

It seems a little incongruous, yes? One is a chuckle-worthy image, and the other is a person.

I mean, there is a good reason, one is much more complicated than the other. Grumpy Cat herself will die far before her image does (maybe that's why she's grumpy?). We can barely simulate nematode neural systems, and even simply finding a human's brain connectome (connection graph) is still prohibitively expensive, much less running the entire graph forwards in time[5].

Instead of doing the naive thing easily suggestible by the S3 analogy and trying to scan people to replicate them across availability zones[6], we could simply extend their lives. For example, we boosted the general US life expectancy from 40 years to 80 years since the early 1800s. But note:


y(t) = a \cdot e^{-b \cdot e^{-c \cdot t}}

It's not even "fuck the natural logarithm", it's "fuck the double logarithm". If we find some fantastic intervention in a pill that reduces our relative risk of death by half without any side effects, that halves the b value, which means this only moves the curve over a few years[10]:

A graph showing two curves, one with normal humans,
and the other with humans that have half the relative risk (RR)

We'll somehow need to invalidate this model with our mental fists.

(At this point, I should point out that there are some people working on the problem with an eye towards halting or reversing aging[11], like The SENS Foundation and The Methuselah Foundation. They are nonprofits, and could always use more money: if nothing else, they could make a bigger incentive prize of the XPrize sort.)

But I didn't write this post to complain about our problems, I wrote this post because:

  1. coining "9s of cats" was too tempting to pass up.
  2. consider this a weak post-pre-registration[12] of an informal study I did for well supported longevity actions we common folk can do today. Sure, the things we do are still subject to the steep demands of the Gompertz curve, but we want to maximize our chances of hitting Kurzweil's escape velocity if/when it happens.

Stay tuned.

Previously.


[1]  Note that this is for a specific object, and not for a set of objects. If you have 10 trillion objects, you might see one of them go missing in a year, and that would be within the guarantee.

[2]  If you want an example of the sorts of replication large tech companies use, you can check out Facebook's blob store.

[3]  Note that while I work for a competitor of Amazon, I don't intend for this to be a pleasant daydream, but a nightmarish one. Also, it bears repeating that I do not presume to speak for my employer, etc etc.

[4]  This doesn't even include things like Alzheimers, which destroy the people without destroying their bodies.

[5]  Contrast this with genome sequencing costs, which have dived faster than exponentially. Today, you can get your genome sequenced for around $1k (the cost is sitting behind some cost request, but I've heard from biologists that Illumina whole-genome sequencing is around that much. Veritas Genetics also has a quote for around that much). It's possible that high resolution scanning technology will hit a similar trend, but it might not.

[6]  Availability zones: broad sectors with non-overlapping support, the theory being that bringing down one zone doesn't bring down the others. Concretely, it would be harder to kill you for good if you had copies living in both Europe and Asia.

[7]  Quip appropriately lifted from Ra, the Space Magic chapter.

[8]  To be fair, that's going from "lol leeches for everyone" to "well, let's scrape your bones out and put them in another person, and hey presto, they stopped dying!".

[9]  More by Gwern on his longevity page.

[10]  Graphic generated using an R+ggplot2 script, available as a Github Gist. I use the same curve that Gwern does circa 2017.

[11]  There are arguments against extending human lifespans, like overcrowding, but that's silly. Droning on about the sanctity of death because it's the Dark Ages is fine, but defaulting to death because oh no there are problems to overcome is a damn defeatist attitude. If you haven't read Bostrom's The Fable of the Dragon Tyrant, it's a gentle storytale introduction to non-deathism.

[12]  A pre-registration, so I can't just sweep things under the table, and weak, because I've already done the bulk of the research and analysis.

Subdermal Scientific Delivery

Epistemic status: crap armchair theorizing.

PutANumOnIt points out that psychology is broken. Having read Robyn Dawes’ House of Cards and Andrew Gelman’s post on the replication crisis, I agree with him, it is kind of crappy that it’s been years since the replication crisis and still nothing seems to have changed.

However, I disagree with the shape of his reaction, both online and in person (I was in the same room with him and the psychology student). What he said was true and necessary, but his frustration wasn’t usefully channeled. I think that adding the 3rd Scott Alexander comment requirement[1], kindness, would have at least very minutely helped move us towards a world of better science.

Why kindness? Well, how could we fix psychology without it? Some fun ideas:

  • The government could set higher standards for the science it funds.
  • Scientific journals could uphold higher standards.
  • The universities that host the psychology professors could start demanding higher standards from the professors, like for granting tenure.
  • The APA (American Psychological Association) could publish guidelines pushing for higher standards[2].
  • Psychology curriculum writers could emphasize statistics more.

If we could do any of these with a wave of a wand, any one of these would… well, wouldn’t end the crisis, but it would push things in the right direction.

However, we don’t have a wand, so I’m not confident any of these are going to happen with the prevailing business as usual.

  • The journals, APA, and curriculum writers solutions are recursive: the psychologists themselves are integral parts of those processes. It’s possible to push on non-recursive parts, like getting a key textbook writer to include an extra chapter on probabilistic pitfalls[3], but trying to hook a key figure is difficult[4].
  • Curriculum writers set their sights on the next generation, not the current one. It seems like the curriculum is already slowly changing, but waiting for the entire field to advance “1 death at a time” is kind of slow.
  • The government is going to move slowly, and special interests like pharmaceutical companies invested in softer standards would throw up (probably non-obvious) roadblocks. Also, the APA has much more cachet with the government than me or Andrew Gelman. David and Goliath is a morality tale, not a blueprint for wild success.

    Or, more concretely, how do you get psychologists to not tell their patients to call their congressmen, because they’re being put out of a job as collateral damage in a campaign for better science?[5]

And notice that these all sum up large efforts: what does it mean to convince the government to have higher standards for the science it funds? It’s an opaque monolithic goal with an absolute ton of moving parts behind the scenes, most of which I’m blissfully ignorant of. These actions are so big that it’s easy to give in to the passive psychological warfare (ha!) and give up. It’s The Art of War, convincing people to accept defeat without even fighting by just impressing them with the apparent momentum of the problem. What could one do to turn that juggernaut?

In contrast, I want to focus on the opposite end of the scale; what if we tried to convince our lone psychology graduate student to consider better statistical methods?


But how? If you squint hard enough, it’s a sort of negotiation: we want the student to spend a non-trivial amount of time learning lots of statistics, while the student probably does not want to spend their Friday evenings reading about the how to choose Bayesian priors. We need to convince the student that they should care, if not on Friday evening, then sooner than later.

Let’s borrow some ideas from the nauseatingly self-help book “Getting Past No”:

  1. “Go to the balcony”: make sure to step back and separate the frustration at poor science from the goal of getting better science.
  2. “Step to their side”: I imagine the psychologists would like to do good science, to take pride in their work and have it stand the test of time. However, just telling someone that there’s a replication crisis isn’t helping them deal with it, it’s putting yet another item on their stack full of things all clamoring for their attention while seeming vaguely negative. And how does it go? “No one ever got fired for choosing <field standard here>”. We will want something more positive…
  3. “Build them a golden bridge”: at the very least, we need to make it easy to use the better statistical methods[6], and offer support to those that are interested. Even better would be demonstrating that the methods we’re offering are better than the old and tired methods they’re using: for example, Jaynes recounts a story in “Probability Theory”, where geological scientists accused him of cheating because the Bayesian methods he used simply could not have been that good.

You’ll note that this is super abstract and not at all a blow-by-blow playbook for convincing anyone about scientific processes. Indeed, the entire process of starting with convincing a single graduate student is to figure out what the actual playbook is. Like in startup parlance, “do things that don’t scale”: even if I directly convinced 1 psychologist a day to use better statistical methods, America mints more than 365 psychologists in a year. But, if I instead found a message that tightly fit the profession and then posted that on the internet, there would be a chance that could take off. (More on this in the Appendix.)

At some point, it’s not enough to have a message that can convince graduate students: if we want to have an impact on timescales shorter than a generation, we’ll have to solve the hard problem of changing a field while the most of the same people are working in it. So, an equally hand-wavey game plan for that scenario:

  1. Ideally, get one of their graduate students on board to provide trusted in-house expertise, and to find out what sorts of problems the research group is facing.
  2. Convince the local statistics professor to endorse you: that way, you can get past the first “this guy is a crank” filters.
  3. (¿¿¿) Somehow convince the professor to consider your methods, who probably wants to work more on his next grant application and less on learning arcane statistics. Apply liberal carrot and stick[7] to refocus their attention on the existential threat slowly rolling towards them. (???)

I expect every community organizer to roll their eyes at my amateur hour hand waving around “and then we convince person X”. However, I am confident we do need to do the hard ground work to make the revolution happen.

In the end, I think we hope to make something like one of the following happen:

  • virally spread a 80/20 payload of better statistics among psychologists, and get a silent super majority of psychologists that all adhere on the surface to current institutional norms, but who eventually realize “wait, literally all my colleagues also think our usage of p values is silly” and a fast and bloodless stats revolution can happen.
  • move the psychology Overton window enough that an internal power struggle to institute better practices can plausibly succeed, led by psychologists that want to preserve the validity of their field.
  • in the course of convincing the entire field, figure out how to actually “statistical spearphish” up and coming field leaders, so they can save their field from the top[8].

So when I heard Jacob express a deep frustration to the student conveying “your methods are bad” (true) which was easily interpretable as “you should feel bad” (probably not intended), I saw the first step of the above revolution die on the vine. Telling people to feel bad (even unintentionally) is not how you win friends and influence people! To head off an obvious peanut gallery objection, it’s not like we’re allowing bad epistemology to flourish because oh no someone might find out they were wrong and feel bad so we can’t say anything ever. It is more pragmatic: compare trying to force someone to accept a new worldview, versus guiding them with a Socratic dialog to the X on the map so they unearth the truth themselves.

Maybe the common community that includes Jacob and I don’t want to devote the absolutely ludicrous resources needed towards reforming a field that doesn’t seem to want to save itself[9]. At the very least, though, we should try not to discourage those that come seeking knowledge, as our graduate student was.

And the alternative? That’s easy, we don’t do anything. Just let psychology spew bad results and eventually crash and bleed out, taking lent scientific credibility with it. I don’t think the field is too big to fail, but it sure would be inconvenient if it did.

(And since you’re the sort of person that reads this blog, then I might add that destroying a field focused on human-level minds right as a soft AI take off starts producing human-level complexity minds might be a poor idea[10].)

However, let’s raise the stakes: what if it’s not just psychology? I have a friend working in another soft-ish science field, closer to biology, and he reports problems there too. An upcoming post will in passing point out some problematic medical research. Again, I don’t think destroying psychology would bring down the entire scientific enterprise, but I do think destroying all fields as soft as biology would. So saving psychology is a way to find out if we can save science from statistical post-modernism; as the song goes “if you can make it there, you can make it anywhere”.

Maybe I’ll take up the cause. Maybe not[11]. If I do, more on this later.


Appendix: Other Actions, Other Considerations

Not everything is trying to convince people in 1-on-1 chats or close quarters presentations/workshops. Especially once we figure out what the scientists need and how we can get it to them, I think we’ll need:

  • better statistical material support geared towards working scientists. Similar to the website idea floated earlier in the post, having a central place that has all the practical wisdom will make it easier to scale.
  • provide better statistical packages that aren’t arcane and insane (looking at you, R), and do The Right Thing by default, and warn when you’re doing the wrong thing and why it is wrong. However, this will likely end up being in the existing statistical ecosystems like R, since that’s where the users are. Similar to the previous point, this also includes better tutorial and usage support.

Other things would help, but are harder in ways I don’t even know how to start solving:

  • Like House of Cards recommends, we could not require therapists to do original research. That’s like requiring medical students to get unrelated undergrad degrees for a touch of class around the office: expensive, inflating the need for positive research, and of dubious help. Yes, reducing credentialism is difficult.
  • Stop requiring positive results for publication. This is the problem for most scientific fields, because you need publication to become a PhD, and you need positive results to publish because negative results aren’t exciting. So you get p-hacking to get published, because you’ve told people “lol sink or swim” and by god they’re going to bring illegal floaties.
  • Or, give negative replications more weight/publication room. This would have the negative effect that it’ll probably increase animosity in the field, and professionals don’t want that, so there will still be costs to overcome. Changing the culture to detach yourself from your results will be… difficult.

[1]  Scott Alexander’s blog, Slate Star Codex, has a comment policy requiring comments be true, necessary, or kind, with at least two of those attributes.

[2]  Sure, guidelines don’t cause higher standards directly, but it makes it much easier to convince people that pay attention, especially those that aren’t already entrenched.

[3]  This specific strategy is additionally prone to failure since teachers pick and choose what material to use from the textbook buffet, so a standalone section on statistics would likely go unused. An entire textbook using unfamiliar statistics would be an even tougher sell.

[4]  In case it’s not clear: trying to convince key figures that they should do a thing is difficult, because if they were easy to convince, then every crank that walked into their office could have the key figure off on their own personal goose chase.

[5]  Yes, there isn’t a 1-to-1 mapping between demanding better statistics and putting therapists out of their job. However, if things have to become legislative, then it seems likely the entire field of psychology will be under attack, with non-trivial airtime going towards people with an axe to grind about psychology. And heaven forbid it become a partisan issue, but when has heaven ever cared?

[6]  In this regard, Stan by Andrew Gelman and co looks pretty interesting, even if I have no idea how to use it.

[7]  Yes, carrot and stick. We’ll need to introduce discussion of negative consequences sooner or later: if not the future destruction of science, then maybe something about their legacy or pride, or whatever.

[8]  Unlikely for the same reasons included in a previous footnote, but included for completeness.

[9]  The field as a whole, not counting individual people and groups.

[10]  A thousand and one objections for why this is a bad analogy spring to mind, but I think we could agree that conditional on this scenario, it couldn’t be worse to have a functioning field of psychology than not.

[11]  Remember, aversion to “someone has to, and no one else will”.

The Future of Football is too Near

Epistemic status: opinions and ranting

What does the future of football look like? Yes, it is totally about football; it starts out weird, just stick with it and you’ll get to the football[1].

Well, the rest of this post is about that story, so… spoilers ahoy.

Didn’t expect your football with a big dollop of science fiction, huh? I generally enjoyed it, which is why I made you read it: the narrative is great at progressively painting[2] an increasingly weird world, using multiple short stories to sketch out the implications of the “what if everyone stopped dying?” high concept. Some parts were cringe worthy: I would expect “if you think about it, everything is a miracle” to be overlaid on nature scenery in a flowing font and shared among people that think crystals affect your chi. Some of it was brilliant: I was a fan of using indentation/color to represent different speakers, instead of wading through “he said, she said”, which is a mechanic I hope to steal.

But you’ve read the story, you already know this. Instead, what I want to do is explore some external relationships to the story:

  1. The author did not converse with the existing universe of science fiction. If you’re going to write science fiction, especially utopic science fiction, then not engaging with existing concepts and utopias just raises endless questions. In my case, it definitely left me with a sense of fridge logic.
  2. The author sketches a world, and raises interesting questions in the tradition of science fiction that comments more about our current world and less about the world to come. Unfortunately, there’s too much emphasis on the commentary part of the story, and not on the story part. The author didn’t dialog with his characters to find out what they wanted, and instead just used his characters as a mouthpiece, which was distracting[3].

On Boredom

Much of the story revolves around people that have lived a long time, and expect to live forever. Nothing else about them has been changed, though, and this means that suddenly human attention spans are a lot shorter than their lives, leading to looming boredom as they quickly (relative to their lifespan) run out of things to do, leading to people sitting in caves and playing the same hand held video game for hundreds of years at a time in order to stretch out the novelty value.

If you squint in just the right way, it’s a sort of crazy mirror metaphor for our lives. We joke about multiples of internet years in a calendar year, and feel a weary sense that we’ve seen it all[4], that Reddit is full of reposts and that meme is so last week. We’re the ones running out of things to do, playing games that look suspiciously similar to games made decades ago while sitting in our (man) caves.

It’s a cute thought, but I reject the notion that the best humanity could do was putting a cannon on a mountain. For example, at least one of my friends would be out driving a car in real life Rocket League, complete with giant exploding ball and a 3rd person follower camera drone, with a slavish attention to using just the right materials in order to match the game physics. Okay, fine, Rocket League is car soccer, so obviously a sci-fi story about football wouldn’t cover any Rocket League related shenanigans. However, spaceball? Roboball, either of the Frozen Cortex or NFL robot mascot variety (limbs are open season!)? Mech Warrior ball? Mariana trench ball, with a genetically modified angler fish ball?

I mean, points for putting a restaurant on a football field, but that’s just scratching the surface for all the different things you could do that would still resemble football in some shape or form.

And outside of football, there’s just so much to do. It’s the post-scarcity far future, the wish-granting telephones are raining outside. And, well… once you’ve see what can be done, why would you go back to playing football?

  • In-story: re-freeze the ice caps and reclaim New York City. The most brute force solution is using sun shades, which is well within their technological grasp. Sure, the author wanted to advertise for climate change action, but the incongruity of “humanity has done everything and is now bored” vs “lol NYC is underwater can’t do anything about that” is jarring.
  • In-story: throw the space probes some extra batteries, or a big-ass reactor. I appreciate the in-narrative way of ending the story, but again, it just makes the humans look incompetent or uncaring.
  • In-story: become a “cyborg with laser cannons for arms and shit”. People were putting magnets inside of themselves years ago, and if they couldn’t die of sepsis, why would they stop there?
  • I refuse to believe that nerds did not get together, say “damn, we’re in a post-scarcity economy! What do we do?” and then not build a Niven-class ringworld around the sun. Or re-enact all of Star Wars, but with fully functional ships. Don’t think people would go through the work to do this? I present to you Ren Faires, Civil War re-enactments, and intricate cosplays, which most of these people are doing without a reasonable expectation of living forever.
  • Or that someone didn’t sit down and think “man, you know what a random planet needs? A huge ass blue monolith! It’s an artistic statement!” like in Zima Blue.
  • Terraform Mars[5], or uplift life on Earth, like in the aptly named Uplift series. Or seed a planet, and try to fast forward evolution[6][7]. We could call it evoball: first one to make a species that can win a football game against humans wins.
  • Maybe physics is only local: how can you be sure? What if the Zones of Thought is an actual thing? You can only check by traveling to the center and fringes of the galaxy, which are quite far away. It’s too bad the rules of the universe probably prevent cyrosleep.
  • Or in a similar way, you can’t be sure that there are aliens which are more driven than you are. It’s reminiscent of trying to do acausal negotiation, or aliens growing up in a bad neighborhood (Watts short story on belligerence (pdf)Watts on organic Disneylands with no children[8]). However, there is no reason not at least send out astrochickens to make sure.
  • If you’re really out of things to do, run timing attacks on the universe (like at the end of Accelerando) just in case we’re in a simulation.

Why doesn’t the author think there will be things to do? Reading the author’s earlier story, The Tim Tebow CFL Chronicles, makes it abundantly clear that the author has confused The Great Stagnation‘s argument of “we’ve picked a bunch of low hanging fruit, so innovation will slow down” with “there will be no further innovation beyond this point”. (If that isn’t what the author was trying to say, sorry, but making two stories in a row about the same thing is how you get labeled “the guy that writes stories with talking cats”). Yes, slow down and smell the roses instead of checking twitter again, but saying “it’s 17776, and we’re bored out of our minds” just ignores so much of what science fiction talks about[9]. Even a series focused solely on pure known physics science fiction, the Borden series of short stories, still comes up with stories worth telling and, eventually, lives worth living.

Instead of doing things, another acceptable answer is attempt to become a bodhisattva, and spend all your time blissing out. Thinking about it, this might be how you could build the same piece of furniture 1000 times in a row, as a meditative exercise. However, the people in this story are not meditation masters: they’re just people desperately ignoring the enormity of the world around them and carrying on with a specific lifestyle brought to them by historical accident[10].

Which leads to another crazy-mirror concept of the story. “We just hang out” is “we just hang out”, and applies just as cleanly to the immortals and us. We killed god in the 1800s, and plagued ourselves with existential ennui and a fear that we’re just wasting time. The only difference is that in the story the god of death has been removed, so actions have even less direction imposed on them. The author answers obliquely by putting in multiple characters coping with living forever by choosing some objective, and then striving for it. According to my understanding, this is also how most people that grapple with “what is my purpose in life?” eventually deal with it. It doesn’t seem like the author likes that answer, but neither does he propose anything else.

On Children

A related thread hinted at in the story is the complete absence of children paired with effortless immortality. “Man, aren’t children great?” the story sighs. “They would have examined this lawn no one else has examined yet. I really wish we had children, so they could keep our world dynamic and interesting, instead of leaving it staid and boring.”

Which is fair (see Children of Men), but the author is already ignoring the children in front of him.

Admittedly, the children we know about are outside of the solar system, but they’re sentient! And furthermore, they don’t want to kill humans, or kill humans in the process of tiling the universe with atomic smiley faces! They care about football, which is a pretty human thing to do[11]!

So you can make sentient beings by feeding computers enough human culture, and seed their interests with whatever (the probes care about football due to football existing in their payload), and their seed system requirements are 1960s computers. It just takes thousands of years to grow one, which might scale with clock speed. At any rate, it beats being pregnant for 9 months, and having 3 probes become sentient with none of them turning out to be psychotic is a pretty good sign. And since they’re in silicon, they don’t have to only come in probe form, although they can if they want. Having only a few people wanting to take care of new robots should still result in a population explosion (see Down and Out in the Magic Kingdom), especially if even a small fraction of the robots want to hang out in the real world. And with each computational upgrade, the robots would become more like the overminds in The Culture, and the shaping of the world would become their story, not ours. Or, we might end up like Solarians in Asimov’s The Naked Sun, where each person has a cadre of robots and eschews human contact.

Do emulations count as humans? If you record all the electrical activity in the brain at the same time (which should be trivial in a world that already has nanotech), and have good enough physics models on fast enough computers, you can run existing humans in silicon also. Sure, they aren’t children per se, but after spending time copying themselves into clans, their societies will probably seem weird enough that they basically are. (See: Age of EmRevelation Space’s discussion of alpha simulations, The Quantum Thief series)

Can humans simply be printed? Similar to these other suggestions, we know what a human is, and have precision nanotech, so the most brute force thing to do is just take some simulated DNA, stick it in simulated sex cells, and then run the physics models forward until the baby would be born, and then build the baby on a molecular level. Of course there are problems with this: Smalley convinced me years ago that nanotech is fabulously more difficult than nanotech pioneers like Drexler sell it as. However, we’re in 17776, and we’ve already hand waved these problems away with the fiat introduction of nanos.

Both the human-related creation methods, though, probably fall under the purview of “no more human children”, which neatly explains why no one is doing it.

But we’ve already seen that creating more electronic minds is possible: hell, that’s the whole opening conceit. Then, why aren’t there more of them? The unlikely answer is that no one wanted to make them: if nothing else, some enterprising human would figure out how to deliver new electronic minds closely matching human children in android form. The sinister answer is that the same mechanism that prevents conceiving babies prevents the deliberate creation of new minds in general (see the Greg Egan story Crystal Nights).

If that is true, how do you get children again? The answer is simple: kill god (for real this time).

You would solve two goals with one stone: fighting against a fantastically powerful entity means there is no reason to mope around in a facsimile of the 1990s, and if you win you would remove the restrictions placed on humanity. Pining after “true, unfabricated struggle”? You got it.

You can’t kill god, you say? I’ve never liked that people said that god is outside the magisterium of physics, when any link to the theological could be exploited to bring it into physics. Some elaboration: one model of the way we found atomic nuclei is by shooting particles at a thin piece of metal 1 atom wide, and seeing what happened. It’s a lot like throwing billiard balls into a box to figure out what’s in the box by how they bounce back and deflect. So, throw billiard balls at god, and see how they bounce back: the theological consumed by physics (or the other way around). Yes, god is traditionally much more complicated than the atomic structure, but then you could roughly model psychology as throwing (metaphorical) billiard balls at humans. The bottom line? GIT GUD at throwing billiard balls, or GTFO.

And to those that think it’s easy to get to know god, but impossible to move it? It would be giving up too soon to not even try; it’s not clear if they’re in an AI box, and you don’t know you can talk your way out of the box until you try. Better GIT GUD at talking to alien minds.

And if god is watching your thoughts, and changing them as you speak? All I can say is GIT GUD[12], and good luck.

On Power

I do appreciate that the frame of the story stays the same as our current time, because trying to write post-singularity fiction is a shitshow.

However, it’s not just the fact that life is basically the same that is unbelievable, I also feel like the power structures as is are implausible.

Consider money. The cashier saying “want any money?” is super cute, a great overturning of our expectations about the economy. However, why would society still agree to have money? “Money was a horrifying abstraction that I had to scrape together in the past to make rent, but instead of saying FUCK YOU to money when we could, we decided to keep it around.” What?

Consider religion. “The Wages of Sin are Death”? Not anymore, sucker! Religions are memetic, and the old salvation and morality memes based on an afterlife won’t cut it anymore. What about a religion that preaches “if only the entire world believes, then the curse of eternal life will be lifted, and then we may enter Valhalla”? Or, “the Wages of Sin must be Death: if God has forsaken the world and will not accept us, then we will have to do it ourselves”. And no matter which religions develop, there would be no bumbling missionaries that can barely preach to a crowd of one, because every missionary would have had 10,000 hours of preaching practice many times over.

Consider nationality. In a new world, what Kurd would agree to the Turkey/Iraq borders as they are drawn? What Israeli or Palestinian would agree to the current borders? I’m skeptical about much of Africa keeping its internal shape, with colonial borders drawn willy nilly according to European dictates. Or for an example close to my heart, there were rumblings of splitting Washington state into Washington and Cascadia, to match the cultural divide of the state. In the limit, how can the current nation-states be stable, in the face of a vastly different world? Now, if everyone today had open borders, I would find this description of the future more compelling.

Well, maybe the states no longer actually carry weight: what is there to administer in a post-scarcity society? Well, there is conflict mediation, and there must be conflicts once you can print nuclear bombs. “Your stupid octogonal building is stupid, and you’re stupid!” they say right before nuking said building. Well, you would hope that once you got that old, you would be more gentle and understanding, more wise. We all know that older people are simply not petty, right? Adults could not have been involved in MsScribe. Old people don’t hold on to grudges, or get into inane fights with their neighborhood Home Owner’s Association (or see the spats in Disneyland in Down and Out in the Magic Kingdom). Or someone decides to be artistic and turn all of the Americas into a blank clean white canvas (also see XKCD #861), and hacks the nanos to do the deed. Again, goodbye old building, just this time it’s every old building in North America.

Or to put it in a less violent way: who decides what happens to the original Monets? Sure, scan it and re-constitute it atom for atom: we know elementary particles are interchangeable, so the copies would be effectively the original according to any conceivable test. And when people insist on a particular set of atoms that happen to maximally match our sense of continuity? Post-scarcity removes scarcity from an increasing set of things, but humans insist on keeping some scarce things. Spouse? Accept no substitutes, not conjured companions nor 30,000 grapefruits! Or intangibly, all needs can be met, except for the need for relationships and status and wanting to be the very best, like no one ever was. And we’re going to mediate these conflicts with 20th century states?

I’d expect something closer to The Archipelago, where folks associate with the people they want to associate with. When you don’t have anything but status games to play, why would you play them with people you don’t like, or refuse to play the same status games? “I can recall a million digits of pi and can dunk on you, but you insist you’re better because you can recite all the lines of Sailor Moon and wrestle sharks.” If you squint, it’s just extrapolating from the existing trends: with the internet, we got a fantastic fragmentation of communities, each focused on their thing. Also see The Diamond Age: when it’s possible to just raise an island in the middle of the ocean and go live there with your friends who are weirdly into Victorian era top hats, instead of living next to the people that loudly insist fedoras are far superior, why wouldn’t you?

Back to the story: let’s grant that there’s still power, possibly in a form of a state, possibly in a way that closely approximates 20th century power structures with a president and all that. Let’s say that some authoritarian state made it to 17776 without overthrowing their dictator, but it finally happens in 17777. There’s a lot of pent up frustration with the dictator, but they can’t simply execute him; besides, execution might be too good for him. What do they do?

On Darkness

So everything’s been fun and games up until now. What would people build with all the free time? Why aren’t there children, even though you can’t have children? (Life, uh, finds a way) Why are the power structures the same?

Well, what could go wrong?

Trigger warnings: torture, mind fuckery, suicide.

Let’s go back to the dictator. What if he was thrown into the sun? He’s obviously going to live, since the rules of the universe enforce that. However, he’s stuck in a 15,000,000C furnace. Depending on exactly how the rules of immortality work, he might be crushed. He would definitely be burning, or if everything except his brain is burned away, then living in enforced solitary confinement with no sensory input. If no one wants to dig him out of the sun, then he could stay there for a very, very long time. (Also see the Priest’s story in Hyperion).

Maybe simple burning for eons on end is too good your enemies. Metamorphosis of Prime Intellect directly tackles this, where the application of endless torture permanently damages at least one person. Of course, the nanos are there to keep humans safe from each other, but all systems can be defeated, and as the tag line of Alien goes, no one can hear you scream in space.

If you stick a pole through someone’s brain, does it give them seizures, or does it maintain their previous mental state, or does the pole simply bounce off? If the powers that be just protect the person against physical assault, something more subtle might work; you can use magnets to induce changes in mind state in people. Watts extrapolates this to maintaining religion in his “A Word for Heathens” short story. Hey presto, Stockholm Syndrome in an MRI! It’s known that brainwashing doesn’t work[13], but things might change when you can actually alter thoughts in flight, or have enough time to experiment with changing the brain chemistry of a person.

Speaking of mind alterations, why are the streets of 17776 so full? They should be emptied by the final drug, wireheading. Just stimulate your reward centers in your brain, and do it endlessly. There would be problems with adapting to the constant stimulation, but I’m sure it could be worked out by 17776. Imagine: you can’t die, but you’re bored. You’ve played football for ten thousand years in a row, and five thousand years ago you were ready to die, having lived a full life. But the kids a street over are talking about a way out. You’ll live forever, and you won’t care, because you’ll be maximally blissed out. Once you wirehead, you won’t decide “man, my life kind of sucks, I should do something else” because nothing would suck, forever. And if even a tiny proportion of people each year decide to wirehead, over time the wireheading population subsumes the human population (see this fictional supporting report for Echopraxia). Eventually, everyone will be smiling, and they won’t be creatures of play, they’ll be creatures of happiness[14], forever.

Or maybe the powers that be decide that these outcomes are too horrifying to allow, so “artificial” modification by electrical or chemical or crowbar means is disallowed. Well, we have depressed people that we help with drugs: are they denied their mind altering chemicals? Did this god doom schizophrenics to an eternity of delusions? Is there some population off-screen that can only lay in bed, hopeless for either positive help or the sweet release of death?


Perhaps you understand now what I mean when I say that The Future of Football is the singularity for noobs. Compared to existing sci-fi options the story is kind of bland, where nothing exciting nor nothing too terrible has happened. It’s great for beginners, though, who haven’t really grappled with living forever or being in a high tech post-scarity society, and need that “see, the future isn’t too wild, but why not think about these ideas?” headfake to get them to consider it[15].

Again, it’s a fine story; it doesn’t deserve a moniker like “a story about for those that haven’t thought about the far future before, and won’t think about the far future again”. However, I think it does function best as a gateway drug into a whole universe of science fiction all excitedly dialoging about the products of our accumulated imaginations.


[1]  Of course it’s American football.

[2]  Pun alert: in the author’s earlier Tim Tebow CFL Chronicles, he refers to the images as paintings, when they’re just images processed with Photoshop filters.

However, this is similar to the econo-art idea I had. It’s derived from eco-rounds in CounterStrike, where players leverage lower-cost equipment to save up for a buy round. Econo-art is just low-cost art, which is just good enough to get the point across. You could spend lots of time making a single beautiful pre-photography realism style painting, or you could apply some Photoshop filters and finish the story in a reasonable time. Maybe more on this in the future.

[3]  “Didn’t you enjoy Harry Potter and the Methods of Rationality?” Well, I also enjoyed this story, so…

[4]  Possible counter: there used to be a lot more variation, but we’ve killed most of it as collateral damage in making the world legible. Thank the gods for global street-by-street GPS navigation, but we’ve lost our cheeses, and I don’t think they’re coming back in 3 days (but see these comments for discussion).

[5]  Some people just really love taking care of animals.

[6]  Fast forward evolution? For example, Seveneves muses on using epigenetic flexibility in order to adapt organisms more quickly to changing environments. This specific example probably can’t be made to work, even for 17776 societies, but there’s probably a lot of study to be brought to bear on genetics.

[7]  There’s an open question about whether running evolution is ethical, since most evolution involves actual death, and probably lots of suffering. I’ll let the 17776 ethics board weigh in on that.

[8]  I broke my back lifting Moloch to heaven, and all I got was a Disneyland with no children. If consciousness turns out to be extraneous, then it might get weeded out in intense competition, which is bad news for us people that value being conscious.

[9]  It doesn’t help that the author sketches a game labeled “capitalism run amok” with a critical tone (which looks an awful lot more like 4chan run amok). Sure, reject creative destruction (which, to be fair, becomes a lot less horrifying when you can no longer freeze to death on the streets), but then complaining that no one is making things to do is somewhat incongruous.

[10]  I also think the argument that “people rejected things that caused them to lose their connection to humanity” implies a different end state than the author paints. Why stop at the 1990s? Why not dribble it all the way back to the savanna, a sort of uber-Amish lifestyle that puts us back into the long childhood of humanity?

Then there’s the argument that people want things to stay the same, citing people that get older and keep everything the same. It’s certainly poetic, “Plates and portraits… would leave unbleached shadows of themselves in the paint”. However, I think the author is ignoring that as people get older, they break down: everything starts to hurt, mental agility declines. Why would they move the couch when they would throw out their back? Let’s reverse aging, and then see what the old folks do.

[11]  Future AI researchers that might be reading this, please do not use “love of football” as an AI friendliness metric. It will go poorly.

[12]  First, a horror story about ideas that cannot be thought, and ideas that can eat you alive. Sound similar?

As far as we know, there’s no telekill material in the universe. What could we plausibly do? One way to try and combat mind reading is to first scan your mind into a computer, and then homomorphically encrypt the scan, and then run it forwards with homomorphic encryption operations to simulate physics while feeding in things about the world. That way you can “think” about the problem without making it easy for god. Sure, once god notices, it would look for the encryption keys, or would keep watch for malicious thoughts joined with thoughts about homomorphic encryption, but these are both a bit harder than just looking for a mind thinking about overthrowing god. If you cannot win, and refuse to lose, impose costs.

[13]  I remember reading this from a semi-trusted source, but now I can’t find it, and can only find articles conveying “lol are your children being BRAINWASHED into a CULT?”.

[14]  My impression is that you will find wireheading abhorrent. “I almost felt transcendent joy. It was awful.” What matters is not that you find it abhorrent now, but whether you will always consider it abhorrent over the next 15,759 years. Without ironclad norms against wireheading, people will eventually try it.

[15]  Associated idea: future shock levels. It’s from 1999, which means that it’s woefully out of date, but the general idea still holds.

Thoughts on My Tribe

Epistemic status: feelings and intuitions.

I’m an aspiring rationalist[1], and I count myself as a part of my local rationality interested community.

And it’s wonderful that the community is here! I can confidently say that if it weren’t, I would be less the sort of person I want to be[2]. It introduced me in quick succession to lots of intelligent people, a series of thoughtful ideas, and immersed me in an infectious self-improvement environment. In a more hands on way, it gave me valuable first lessons in people management when I found myself growing into the defacto leader of a rationalist group house. And, it gave me a people I could call my people[3].

But lately, the community has been dragging on my soul.

The drag is low-grade apprehension, because our defacto leader is leaving for that galactic attractor, The Great Bay Hole[4]. We’ve seen this story before[5]; one person steps up to run things, making sure that meetups happen and generally keeping up the community. Unfortunately, there are two ways this falls apart: first, the sort of person that becomes an energetic charismatic leader tends to reason themselves into a corner that requires them to move to The Bay so they can Save The World[6]. Or, if there’s something keeping them from moving to The Bay, then whatever that is can suddenly require more from them, so the person has to load shed, and leading the community will go out the window before whatever the Bay-Blocking Important Thing is. Either way, they end up leaving after a stint as the local community leader, leaving a vacuum of responsibility, which usually one person to steps up to fill…

This sort of arrangement can be sustainably unsustainable, if there’s enough new energetic people that stick around for a few years before abdicating their position. However, there isn’t currently a clear energetic charismatic leader candidate. The people with babies? The people that will have babies soon? The people busy with school? The people busy with work? The people with unfortunate amounts of anxiety? Me?

could talk at length about the different ways gardening[7] the community is a thorny proposition: I agonized over a few drafts of this post that were all about those difficulties[8]. However, most of the musing was quite abstract, and after thinking about it I realized that while most of my concerns were relevant, they weren’t ultimately important: they didn’t get at the heart of why I felt apprehensive.

The heart of my apprehension is a fear of ending up alone, becoming the sole person putting in non-trivial effort to keep the dream alive. If I pick up the mantle, then it becomes difficult to put back down; don’t I have a responsibility to the community that helped me so much? I should just suck it up and focus more and more energy on the management of the community. And then one day I’ll find myself muttering “somebody has to, and no one else will”, the same thing I internal monologued while burning myself out running the group house, and a phrase I firmly believe should be reserved for profoundly tragic figures, not your everyday run of the mill humans[9].

Yes, dropping responsibilities on the floor was/is/will always be an option: the global BATNA[10] has never been more amenable. But being the last one turning out the lights is sad, like I personally doomed my people to astronomical rents[11], horrific public transportation[12], a boring culture, and pleasant year-round weather. And when I consider the possible outcomes, it’s failure that weighs on my mind. Better to never try, instead of putting in a heroic effort and then seeing it all fall apart in the end anyways.

Well, when you put it that way the counter is obvious: don’t focus solely on the negative outcomes, duh. My counter-counter is wrapped up in the sprawling unpublished essay[13] on my expected cost/benefit for community gardening: high cost, potentially high reward with high variability. Against this uncertainty, I have a menagerie of personal hopes and dreams, a todo list the length of my arm with little of it directly tied to running a community. Is it worth it for me to step up into the energetic charismatic leader[14] role? To put it mildly, I’m uncertain, and it doesn’t help that even when I try to plug my ears, I still hear the doom and gloom rolling in over the community.


This story has a tentatively happy ending.

A recent[15] meetup tried to figure out what the group would be doing, and several people stepped up to take on temporary shared responsibility, with more people tentatively waiting in the wings if things fell through. We’ve tried a similar leadership sharing scheme before, which failed after a brief stint, but we haven’t tried it more than once and in this particular configuration, which makes this “an interesting experiment” and not “the definition of insanity”.

Yes, really, the fact that I physically saw a handful of people willing and able to help, not including myself, really upped my probability[16] that things could keep functioning, and not on Ye Olde Single Energetic Leader model we’ve been chugging forward with. I know from my experiences with the group house that foisting everything on one person produces less work overall, since there are no communication costs. However, not paying the initial costs to make sure people can provide extra capacity means any bump in workload is really a bump in workload for one person, who can’t delegate because the scaffolding isn’t in place. Plus, there’s no redundancy. Therefore, it’s worth the upfront costs to spread the work around, which makes this shared responsibility scheme exciting[17].

It’s not “everything is easy now, and nothing could possibly go wrong”; there are still real costs, and the problems to overcome are still difficult[18]. It’s more about putting to rest the feeling that “I’m the last line, and here I will make my stand alone” and transmuting it into “if I’m part of a last line, then at least I won’t be alone”. Which isn’t the sort of thing you don’t alieve[19] until you see it moving into action, with the ideas and commitment rolling in.

Maybe everything isn’t hopeless bullshit. We’ll keep flying this plane, and with a little elbow grease, maybe we’ll fly it into the last sunset.


[1]  If you’re not familiar, it’s the sort of new wave rationalism original based out of LessWrong (notably a shadow of its former self now), not the sort of enlightenment rationalism that insisted the world had to make sense, and damn it humanity had a duty to change the world if it didn’t conform. As it turns out, this old-school worldview runs into problems, which we hope to avoid.

[2]  Keeping in mind that being part of the community has most certainly altered my idea of what an ideal version of myself looks like.

[3]  Scott Alexander just recently talked about this, explaining it’s his karass, and I’m sure it’s my karass as well well.

[4]  The global community started with an unusual number of people in The Bay, and because we’re not really on board with inefficiencies, obviously moving to the place with all your online friends makes sense. Once you’ve moved, it makes even more sense for your friends to move to The Bay…

[5]  Simplified account is simplified.

[6]  If it’s not Save The World, it might even be as simple as “it’s easier to run my startup there” or “all the people I want to collaborate with are there”.

[7]  “Gardening a community” is a nice way to formulate community growth, which I’ve stolen from Scott’s “In Favor of Niceness, Community, and Civilization”.

[8]  In abbreviated form, what I think makes gardening the community hard: our standards are high, so putting together content is daunting. We’re concerned about using rationality instrumentally, but our interests are varied, so it’s hard to get enough people together to put things to practice on the same target. Doing original work is difficult, since a lot of the low hanging fruits have been picked. We’re drawn from contrarian-heavy populations. Relatedly, we value truth over conformity, even if it makes things more inefficient. Generally, the modern community BATNA means it’s easier to leave groups with difficult problems, even if they are also important problems. Management of a community is not the same thing at all as dealing with whatever the community is focused on, so management is a chore instead of something that comes naturally. Personally, I have truncated social needs, so I would be okay as a Seder/Solstice rationalist (analog with Christmas/Easter Christian). I also think I’m missing a formative experience of exploratory collaboration that the community facilitated, which would help me feel that the community is important.

[9]  There are things worth doing this for, like challenging hell, but no matter how you cut it “running a meetup” is not in the same reference class.

[10]  Best Alternative To Negotiated Agreement (BATNA). It’s never been easier to simply leave; there’s are groups looking for members everywhere, and you’re not stuck in one village your entire life.

[11]  I recognize that saying this from the NYC metropolitan area is lol worthy, but at least we have a proper city.

[12]  DAMMIT NJ TRANSIT YOU ARE NOT HELPING.

[13]  The aforementioned first drafts of this post are basically that essay.

[14]  I’d have to work on the charisma. And the energetic also, probably. And, well, if we’re being honest, the leader part too…

[15]  It’s not-so-recent by this point; this post is like 2-3 months on a timely issue, which doesn’t really work. Well, something for me to works towards.

[16]  I was also surprised by the extent to which I was moved by the social proof of people earnestly discussing things in a room.

[17]  I also recognize that if I have too much of a hand in designing this sort of organizational scaffolding, I’d probably be prone to second system effects. Something to watch out for.

[18]  We’re not even talking about the really hard sorts of problems, like solving climate engineering, nuclear proliferation, or intelligence foom scenarios. They’re much more mundane, like “what should we talk about next week?” or “how many game nights per month is too much?”. Solving the mundane problems will hopefully help progress on the harder problems, but we’ll have to see how that pans out.

[19]  Useful concept alert: you know something, you believe it. But do you feel it in your gut, do you alieve it?

Sandbox Statistics: Methodological Insurgency Edition

Epistemic status: almost certainly did something egregiously wrong, and I would really like to know what it is.


I collect papers. Whenever I run across an interesting paper, on the internet or referenced from another paper, I add it to a list for later perusal. Since I'm not discriminating in what I add, I add more things than I can ever hope to read. However, "What Went Right and What Went Wrong": An Analysis of 155 Postmortems from Game Development (PDF) caught my eye: an empirical software engineering process paper, doing a postmortem on the process of doing postmortems? That seemed relevant to me, a software engineer that does wrong things once in a while, so I pulled it out of paper-reading purgatory and went through it.

The paper authors wanted to study whether different sorts of game developers had systematic strengths and weaknesses: for example, they wanted to know whether "a larger team produces a game with better gameplay." To this end, they gathered 155 postmortems off Gamasutra[1] and encoded each one[2] as claiming a positive or negative outcome for a part of the game development process, like art direction or budget. They then correlated these outcomes to developer attributes, looking for systematic differences in outcomes between different sorts of developers.

I'll be upfront: there are some problems with the paper, prime ofwhich is that the authors are a little too credulous given the public nature of the postmortems. As noted on Hacker News, the companies posting these postmortems are strongly disincentivised from actually being honest; publicly badmouthing a business partner is bad for business, or airing the company dirty laundry is bad for business, or even saying "we suck" is bad for business. Unless the company is going under, there's little pressure to put out the whole truth and nothing but the truth, and instead a whole lot of pressure to omit the hard parts of the truth, maybe even shade the truth[3]. It's difficult to say that conclusions built on this unstable foundation are ultimately true. A second problem is the absence of any discussion of statistical significance; without knowing if statistical rigor was present, we don't know if any conclusions drawn are indistinguishable from noise.

We can't do much about the probably shaded truth in the source material, but we might be able to do something about the lack of statistical rigor. The authors graciously publicized their data[4], so we can run our own analyses using the same data they used. Of course, any conclusions we draw are still suspect, but it means even if I screw up the analysis, the worst that could happen is some embarrassment to myself: if I end up prescribing practicing power poses in the mirror or telling Congress that cigarettes are great, no one should be listening to me, since they already know my source data is questionable.

Now we have a sandbox problem and sandbox data: how do we go about finding statistically significant conclusions?

p-values

Before we dive in, a quick primer about p-values[5]. If you want more than this briefest of primers, check out the Wikipedia article on p-values for more details.

Roughly speaking, p-values are the chance that a null hypothesis, the boring, no interesting effect result, is true given the data we see. The lower the p-value is, the more likely a non-boring outcome is.

For example, if we're testing for a loaded coin, our boring null hypothesis is "the coin is fair". If we flip a coin 3 times, and it comes up heads twice, how do we decide how likely it is that a fair coin would generate this data? Assuming that the coin is fair, it's easy to see that the probability of a specific sequence of heads and tails, like HTH, is (\frac{1}{2})^3 = \frac{1}{8}. We need to use some combinatorial math in order to find the probability of 2 heads and and 1 tail in any order. We can use the "choose" operation to calculate that 3 \text{ choose } 2 = {{3}\choose{2}} = 3 different outcomes match 2 heads and 1 tail. With 3 coin flips, there are 8 equally probable outcomes possible, so our final probability of 2 heads and 1 tail in any order is 3/8.

However, neither of these are the probability of the coin being fair. Intuitively, the weirder the data, the less weight we shoul give to the null hypothesis: if we end up with 20 heads and 2 tails, we should be suspicious that the coin is not fair. We don't want to simply use the probability of the outcome itself, though: ending up with one of 100 equally probable outcomes is unremarkable (one of them had to win, and they were all equally likely to win), while ending up with an unlikely option instead of a likely option is remarkable. By analogy, receiving 1 inch of rain instead of 1.1 inches in Seattle is unremarkable, even if getting exactly 1 inch of rain is unlikely. Receiving any rain at all in the Sahara Desert is remarkable, even if it's the same probability as getting exactly 1 inch of rain in Seattle. The weirdness of our data depends not just the probability of the event itself, but the probability of other events in our universe of possibility.

The p-value is a way to solidify this reasoning: instead of using the probability of the outcome itself, it is the sum of the probability of all outcomes equally or less probable than the event we saw[6]. In the coin case, we would add the probability of 2 heads and 1 tail (3/8) with the probability of the more extreme results, all heads (1/8), for p=0.5.

But wait! Do we also consider a result of all tails to be more extreme than our result? If we only consider head-heavy results in our analysis, that is known as a one-tailed analysis. If we stay with a one-tailed analysis, then we will in essence be stating that we knew all along that the coin would always have more heads in a sequence, and we only wanted to know by how much it did so. This obviously does not hold in our case: we started by assuming the coin was fair, not loaded, so tails-heavy outcomes are just as extreme as heads-heavy outcomes and should be included. When we do so, we end up with p=1.0: the data fits the null hypothesis closely[7]. One-tailed analysis is only useful in specific cases, and I'll be damned if I fully understand those cases, so we'll stick to two-tailed analyses throughout the rest of this post.

If there were only ever two hypotheses, like the coin being fair, or not, then rejecting one implies the other. However, note that rejecting the null hypothesis says nothing about choosing between multiple other hypotheses, like whether the coin is biased towards the head or tail, or by how much a coin is biased. Those questions are certainly answerable, but not with the original p-value.

How low a p-value is low enough? Traditionally, scientists have treated p<0.05 as the threshold of statistical significance: if the null hypothesis were true, it would generate data this extreme less than 1/20th of the time purely by chance, which is pretty unlikely, so we should feel safe rejecting the boring null hypothesis[8].

There are problems with holding the p<0.05 threshold as sacrosanct: it turns out making p=0.05 a threshold for publication means all sorts of fudging with the p-value (p-hacking) happens[9], which is playing a part in psychology's replication crisis, which is where the 2nd part of this post's title comes from[10].

For these reasons, the p-value is a somewhat fragile tool. However, it's the one we'll be using today.

Adjusting expectations

The first step is simple: before looking at any of the data, can we know whether any conclusions are even possible?

The first step would be to do a power analysis, and find out whether 155 postmortems is enough data to produce significant results. First, we need to choose an expected effect size we think our data will display: usual values range from 0.1 (a weak effect) to 0.5 (a strong effect). Yes, it's subjective what you choose. We already know how many data points we have, 155 (normally we would be solving for this value, to see how big our sample size would have to be). Now, I'm not going to calculate this by hand, and instead use R, a commonly used statistical analysis tool (for details on running this, see the appendix below). Choosing a "medium" effect size of 0.3 with n=155 data points tells us that we have a projected 25% false negative rate, a ~1/4 chance to miss an existing effect purely by chance (see the appendix for more details about running the analysis). It's not really a great outlook, but we can't go back and gather more data, so we'll just have to temper our expectations and live with it.

What about looking at other parts of the general experiment? One potential problem that pops out is the sheer number of variables that the experiment considers. There are 3 independent variables (company attributes), and 22 dependent variables (process outcomes) that we think the independent variables affect, for a total of 3\cdot 22=66 different correlations that we are looking at separately. This is a breeding ground for the multiple comparisons problem: comparing multiple results against the same significance threshold increases the chances that at least one conclusion is falsely accepted (see this XKCD for a pictorial example). If you want to hold steady the chances that every conclusion you accept is statistically significant, then you need to make the evidential threshold for each individual correlation stricter.

But how much more stricter? Well, we can pick between the Bonferroni, the Sidak, and the Holm-Bonferroni methods.

The Bonferroni method simply takes your overall threshold of evidence, and divides by the number of tests you are doing to get the threshold of evidence for any one comparison. If you have m=5 tests, then you have to be 5 times as strict, so 0.05/5 = 0.01. This is a stricter restriction than necessary: however, it's easy to calculate, and it turns out to be a pretty good estimate.

The Sidak method calculates the exact overall threshold of evidence given the per-comparison threshold. The previous method, the Bonferroni, is fast to calculate, but it calls some outcomes insignificant when it in fact has enough evidence to label those outcomes as significant. The Sidak method correctly marks those outcomes as significant, in exchange for a slightly more difficult calculation. The equation is:

p_{comparison} = 1 - (1 - p_{overall})^{1/m}

There's some intuition for why this works in a footnote [11].

If p_{overall}=0.05 (as is tradition) and m=5, then p_{comparison}=0.0102. This is not that much less strict than the Bonferroni bound, which is simply p_{Bonferroni}=0.01, but sometimes you just need that extra leeway.

The Holm-Bonferroni method takes a different tack: instead of asking each comparison to pass a stringent test, it asks only some tests to pass the strict tests, and then allows successive tests to meet less strict standards.

We want to end up with an experiment-wide significance threshold of 0.05, so we ask whether each p-value from low to high is beneath the threshold divided by its number in line, and stop considering results significant once we reach a p-value that doesn't reach its threshold. For example: let's say that we have 5 p-values, ordered from low to high: 0.0001, 0.009, 0.017, 0.02, 0.047. Going in order, 0.0001 < 0.05/5 = 0.01, and 0.009 < 0.05/4 = 0.0125, but 0.017 > 0.05/3 = 0.0167, so we stop and consider the first two results significant, and reject the rest.

There is a marvelous proof detailing why this works which is too large for this post, so I will instead direct you to Wikipedia for the gory details.

With these methods, if we wanted to maintain a traditional p=0.05 threshold with m=66 different comparisons, we need to measure each individual comparison[12] against a p-value of:

p_{Bonferroni}=0.000758
p_{Sidak}=0.000777
p_{Holm}=(\text{between } 0.000758 \text{ and } 0.05)

We haven't even looked at the data, but we're already seeing that we need to meet strict standards of evidence, far beyond the traditional 0.05 threshold. And with n=155 data points at best (not all the postmortems touch on every outcome), it seems unlikely that we can meet these standards.

Perhaps I spoke too soon, though: can the data hit our ambitious p-value goals?

Testing the data

So how do we get p-values out of the data we have been given?

Keep in mind that we're interested in comparing different proportions of "this went well" and "this went poorly" responses for different types of companies, and asking ourselves whether there's any difference between the types of companies. We don't care about whether one population is better or worse, just that they have different enough outcomes. In other words, we're interested in whether the different populations of companies have the same proportional mean.

We'll use what's known as a contingency table to organize the data for each test. For instance, let's say that we're looking at whether small or large companies are better at doing art, which will produce the following table:

Small Company Large Company
Good Art 28 16
Bad Art 12 6

We want to compare the columns, and decide whether they look like they're being drawn from the same source (our null hypothesis). This formulation is nice, because it makes obvious that the more data we have, the more similar we expect the columns to look due to the law of large numbers. But how do we compare the columns in a rigorous way? I mean, they look like they have pretty similar proportions; how different can the proportions in each column get before they are too different? It turns out that we have different choices available to determine how far is too far.

z-test, t-test

The straightforward option is built in to R, called prop.test. Standing for "proportionality test", it returns a p-value for the null hypothesis that two populations have the same proportions of outcomes, which is exactly what we want.

However, a little digging shows that there are some problematic assumptions hidden behind the promising name. Namely, prop.test is based on the z-test[13], which is built on the chi-squared test, which is built on the assumption that large sample sizes are available. Looking at our data, it's clear our samples are not large: a majority of the comparisons are built on less than 40 data points. prop.test handily has an option to overcome this, known as Yates continuity correction, which corrects p-values for small sample sizes. However, people on CrossValidated don't trust Yates, and given that I don't understand what the correction is doing, we probably shouldn't either.

Instead, we should switch from using the z-test to using the t-test: Student's t-test makes no assumptions about how large our sample sizes are, and does not need any questionable corrections. It's a little harder to use than the z-test, especially since we can't make assumptions about variance, but worth the effort.

Fischer

However, the t-test still makes an assumption that the populations being compared are drawn from a normal
distribution
. Is our data normal? I don't know, how do you even see if binary data (good/bad) is normal? It would be great if we could just sidestep this, and use a test that didn't assume our data was normal.

It turns out that one of the first usages of p-values matches our desires exactly. Fischer's exact test was devised for the "lady tasting tea" experiment, which tested whether someone could tell whether the milk had been added to the tea, instead of vice versa[14]. This test is pretty close to what we want, and has the nice property that it is exact: unlike the t-test, it is not an approximation based on an assumption of normal data.

Note that the test is close, but not exactly what we want. The tea experiment starts with by making a fixed number of cups with milk added, and a fixed number of cups with tea added. This assumption bleeds through into the calculation of the p-value: as usual, Fischer's test calculates the p-value by looking at all possible contingency tables that are "more extreme" (less probable) than our data, and then adding up the probability of all those tables to obtain a p-value. (The probability of a table is calculated with some multinomial math: see the Wikipedia article for details). However while looking for more extreme tables it only looks at tables that add up to the same column and row totals as our data. With our earlier example, we have:

28 16 =44
12 6 =18
=40 =22

All the bolded marginal values would be held constant. See the extended example on Wikipedia, especially if you're confused how we can shuffle the numbers around while keeping the sums the same.

This assumption does not exactly hold in our data: we didn't start by getting 10 large companies and 10 small companies and then having them make a game. If we did, it would be unquestionably correct to hold the column counts constant. As it stands, it's better to treat the column and row totals as variables, instead of constants.

Barnard

Helpfully, there's another test that drops that assumption: Barnard's test. It's also exact, and also produces a p-value from a contingency table. It's very similar to Fischer's test, but does not hold the column and row sums constant when looking for more extreme tables (note that it does hold the total number of data points constant). There are several variants of Barnard's test based on how exactly one calculates whether a table is more extreme or not, but the Boschloo-Barnard variant is held to be always more powerful that Fischer's test.

The major problem with Barnard is that it is computationally expensive: all the other tests run in under a second, but running even approximate forms of Barnard take considerably longer. Solving for non-approximate forms of Barnard with both columns and rows unfixed take tens of minutes. With 66 comparisons to calculate, this means
that it's something to leave running overnight with a beefy computer (thank the gods for Moore's law).

You can see the R package documentation (PDF) for more details on the different flavors of Barnard available, and all the different options available. In our case, we'll use Boschloo-Barnard, and allow both rows and columns to vary.

Outcomes

So now we have our data, a test that will tell us whether the populations differ in a significant way, and a few ways to adjust our p-values to account for multiple comparisons. All that remains is putting it all together.

When we get a p-value for each comparison, we get (drum roll): results in a Google Sheet, or a plain CSV.

It turns out that that precisely 1 result passes the traditional p=0.05 threshold with Barnard's test. This is especially bad: if there was no effect whatsoever, we would naively expect 66 \cdot 0.05 \sim 3 of the comparisons to give back a "significant" result. So, we didn't even reach the level of "spurious results producing noise", far away from our multiple comparison adjusted thresholds we calculated earlier.

This is partly due to such a lack of data that some of the tests simply can't run: for example, no large companies touched on their experience with community support, either good or bad. With one empty column, none of the tests can give back a good answer. However, only a few comparisons had this exact shortcoming; the rest likely suffer from a milder version of the same problem, where there were only tens of data points on each side, which doesn't produce confidence in our data, and hence higher p-values.

In conclusion, there's nothing we can conclude, folks, it's time to pack it up and go home.

p-value Pyrotechnics

Or, we could end this Mythbusters style: the original experiment didn't work, but how could we make it work, even if we were in danger of losing some limbs?

In other words, the data can't pass a p=0.05 threshold, but that's just a convention decided on by the scientific community. If we loosened this threshold, how far would we have to loosen it in order to have a statistically significant effect in the face of multiple comparisons and the poor performance of our data?

It turns out that reversing Bonferroni correction is impossible: trying to multiply p=0.023 (the lowest Barnard-Boschloo p-value) by 66 hands back 0.023 \cdot 66 \sim 1.5, which is over 1.0 (100%), which is ridiculous and physically impossible. The same holds for Holm-Bonferroni, since it's built on Bonferroni.

So let's ditch Barnard-Boschloo: the t-test hands back a small p-value in one case, at 5.14 \cdot 10^{-6}. This we can work with! 5.14 \cdot 10^{-6} \cdot 66 = 0.000339, far below 0.05. This is pretty good, this outcome even passes our stricter multiple-comparisons adjusted tests. But what if we wanted more statistically valid results? If we're willing to push it to the limit, setting p_{overall}=0.9872 gives us just enough room to snag 3 statistically significant conclusions, either with Bonferroni or Holm-Bonferroni applied to the t-test results. Of course, the trade-off is that we are virtually certain that we are accepting a false positive conclusion, even before taking into account that we are using p-values generated by a test that doesn't exactly match our situation.

Reversing Sidak correction gets us something saner: with 66 tests and our lowest Barnard-Boschloo p-value, p=0.023, we have an overall 1-(1-0.023)^{66}=p_{overall}=0.785. Trying to nab a 2nd statistically significant conclusion pushes p_{overall}=0.991. Ouch.

This means that we can technically extract conclusions from this data, but the conclusions are ugly. A p=0.785 means that if there is no effect in any of our data, we expect to see a at least one spurious positive result around 75% of the time. It's worse than a coin flip. We're not going to publish in Nature any time soon, but we already knew that. Isn't nihilism fun?

Conclusions

So, what did we learn today?

  • How to correct for multiple comparisons: if there are many comparisons, you have to adjust the strictness of your tests to maintain power.
  • How to compare proportions of binary outcomes in two different populations.

At some point I'll do a Bayesian analysis for the folks in the back baying for Bayes: just give me a while to get through a textbook or two.

Thanks for following along.

Appendix: Running the Analysis

If you're interested in the nitty gritty details of actually running the analyses, read on.

For doing the power analysis, you want to install the pwr package in R. In order to run a power analysis for the proportion comparison we'll end up doing, use the pwr.2p.test function (documentation (PDF)), and use n=155 data points and a "medium" effect size (0.3). The function will hand back a power value, which is the inverse of the false negative rate (1-\text{"false negative rate"}). If you want to do a power analysis for other sorts of tests not centered around comparing population proportions, you will need to read the pwr package documentation for the other functions it provides.

Now on to the main analysis…

First, download the xlsx file provided by the paper author (gamasutra_codes.xslx, in a zip hosted by Google Drive).

The "Codes" sheet contains all the raw data we are interested in. Extract that sheet as a CSV file if you want to feed it to my scripts. The "Results" sheet is also interesting in that it contains what was likely the original author's analysis step, and makes me confident that they eyeballed their results and that statistical power was not considered.

Second, we need to digest and clean up the data a bit. To paraphrase Edison, data analysis is 99% data cleaning, and 1% analysis. A bit of time was spent extracting just the data I needed. Lots of time was spent defending against edge cases, like case rows not all having the same variable values that should be the same, and then transforming the data into a format I better understood. There are asserts littering my script to make sure that the format of the data stays constant as it flows through the script: this is definitely not a general purpose data cleaning script.

You can check out the data cleaning script as a Github gist (written in Python).

This data cleaning script is meant to be run on the CSV file we extracted from the xlxs file earlier (I named it raw_codes.csv), like so:

python input_script.py raw_codes.csv clean_rows.csv 

The actual data analysis itself was done in R, but it turns out I'm just not happy "coding" in R (why is R so terrible?[15][16]). So, I did as much work as possible in Python, and then shipped it over to R at the last possible second to run the actual statistical tests.

Get the Python wrapper script, also as a Github gist.

Get the R data analysis script used by the wrapper script, also as a Github gist.

The R script isn't meant to be invoked directly, since the Python wrapper script will do it, but it should be in the same directory. Just take the CSV produced by the data cleaning step, and pass to the wrapper script like so:

python analysis.py clean_rows.csv \
    --t_test --fischer_test \
    --barnard_csm_test \
    --barnard_boschloo_test 

This produces a CSV analysis_rows.csv, which should look an awful lot like the CSV I linked to earlier.


Math rendering provided by KaTeX.


[1] The video game community has a culture that encourages doing a public retrospective after the release of a game, some of which end up on Gamasutra, a web site devoted to video gaming.

[2] The authors tried to stay in sync while encoding the postmortems to make sure that their each rater's codings were reasonably correlated with each other, but they didn't use a more rigorous measure of inter-rater reliability, like Cronbach's alpha.

[3] Even if the company is going under, there are likely repercussions a no-holds barred retrospective would have for the individuals involved.

[4] It turns out Microsoft wiped the dataset supposedly offered (probably due to a site re-organization: remember, it's a shame if you lose any links on your site!), but thankfully one of the authors had a copy on their site. Kudos to the authors, and that author in particular!

[5] This is also your notice that this post will be focusing on traditional frequentist tools and methods. Perhaps in the future I will do another post on using Bayesian methods.

[6] One of the curious things that seems to fall out of this formulation of the p-value is that you can obtain wildly different p-values depending on whether your outcome is a little less or a little more likely. Consider that there are 100 events, 98 of which happen with probability 1/100, and one that happens with probability 0.00999 (event A), for 0.01001 remaining probability on the last event (event B). If event A happens, p=0.00999, but if event B happens, p=1.0. These events happen with mildly different probabilities, but lead to vastly different p-values. I don't know how to account for this sort of effect.

[7] This is kind of a strange case, but it makes sense after thinking about it. Getting an equal number of heads and tails would be the most likely outcome for a fair coin (even if the exact outcome happens with low probability, everything else is more improbable). Since we're flipping an odd number of times, there is no equals number of heads and tails, so we have to take the nex best thing, an almost equal number of heads and tails. Since there's only 3 flips, the most equal it can get is 2 of one kind and 1 of another. Therefore, every outcome is as likely or less so than 2 heads and a tail.

[8] However, note that separate fields will use their own p-value thresholds: physics requires stringent p-values for particle discovery, with p=0.0000003 as a threshold.

[9] This wouldn't be such a big deal if people didn't require lots of publications for tenure, or accepted negative results for publication. However, we're here to use science, not fix it.

[10] Reminder: I'm almost certainly doing something wrong in this post. If you know what it is, I would love to hear it. TELL ME MY METHODOLOGY SINS SO I CAN CONFESS THEM. It's super easy, I even have an anonymous feedback form!

[11] So why does the Sidak equation have that form?

Let's say that you are trying to see Hamilton, the musical, and enter a lottery every day for tickets. Let's simplify and state that you always 1 out of 1000 people competing for one ticket, so you have a 0.001 chance of winning a ticket each day.

Now, what are the chances that you win at least once within the next year (365 days)? You can't add the probability of winning 365 times: if you extend that process, you'll eventually have more than 100% chance of winning, which simply doesn't make sense. Thinking about it, you can never play enough times to guarantee you will win the lottery, just play enough times that you will probably win. You can't multiply the probability of winning together 365 times, since that would be the probability that you win 100 times in a row, an appropriately tiny number.

Instead, what you want is the probability that you lose 365 times in a row; then inverting that gets you the probability that you win at least once. The probability of losing is 0.999, so 365 \cdot 0.999 = 0.694. But we don't want the probability of losing 365 times in a row: we want the chance that doesn't happen. So we invert by subtracting that probability from 1, 1-0.694, for a total probability of winning equal to 0.306.

Generalizing from a year to any number of days N, this equation calculates the total probability of winning.


p_{total} = 1 - (1 - p_{winning})^N

Which looks an awful lot like the Sidak equation. The exponent contains a N instead of a \frac{1}{m}, since p_{total} corresponds with p_{overall} in the Sidak equation: solving for p_{winning} will net you the same equation.

[12] An unstated assumption throughout the post is that each measure of each variable is independent of each other measure. I don't know how to handle situations involving less-than-complete independence yet, so that's a topic for another day. This will probably come after I actually read Judea's Causality, which is a 484 page textbook, so don't hold your breath.

[13] The manual page for prop.test was not forthcoming with this detail, so I had to find this out via CrossValidated.

[14] It's adorable how Victorian the experiment sounds.

[15] Allow me to briefly rant about R's package ecosystem. R! Why the fuck would you let your users be so slipshod when they make their own packages? Every other test function out there takes arguments in a certain format, or a range of formats, and then a user defined package simply uses a completely different format for no good reason. Do your users not care about each other? Do your dark magicks grow stronger with my agony? Why, R!? Why!?

[16] I suppose I really should be using pandas instead, since I'm already using python.

Tools I Use

I’ve been thinking about whether the tools I use to get things done are good enough. Where are the gaps in my toolset? Do I need to make new tools for myself? Do I need to make tools that can make more tools[1]?

Before diving too deep, though, I thought it would be helpful to list out the tools I use today, why I use them, and how I think they could be better. It’s a bit of a dry list, but perhaps you’ll find one of these tools is useful for you, too.


Getting Things Done

Habitica/HabitRPG

Say what you will about gamification, but when it works, it works.

I wasn’t a habitual child, adolescent, or young adult. I had the standard brush/floss teeth habit when going to sleep, and nothing much beyond that. Sure, I tried to cultivate the habit of practicing the violin consistently, but that culminated with only moderate success in my early college years.

Then I picked up HabitRPG (now Habitica) in 2014, and suddenly I had to keep a central list of habits up to date on a daily basis, or I would face the threat of digital death. Previous attempts at holding myself to habits would track my progress on a weekly basis, or fail to track anything at all, but the daily do-or-die mentality built into Habitica got me to keep my stated goals at the forefront of my mind. Could I afford to let this habit go unpracticed? Am I falling into this consistent pattern of inaction which will get me killed in the long run? It was far from a cure-all, but it was a good first step to getting me to overcome my akrasia and do the things that needed to be done[2].

Currently, I only use the daily check-in features (“Dailies”): at first I also used the todo list, but it turned out that I wanted much, much more flexibility in my todo system than Habitica could provide, so I eventually ditched it for another tool (detailed below). I simply never got into using the merit/demerit system after setting up merits and demerits for myself.

org-mode

I have tried making todo lists since I was a young teenager. The usual pattern would start with making a todo list, crossing a couple items off it over a week, and then I would forget about it for months. Upon picking it back up I would realize each item on the list was done, or had passed a deadline, or I didn’t have the motivation for the task while looking at the list. At that point I would throw the list out; if I felt really ambitious in the moment, I would start a new list, and this time I wouldn’t let it fade into obsolescence…

Habitica fixed this problem by getting me into the habit of checking up on my todo list on a regular basis, which meant my todo lists stopped getting stale, but the todo list built into the app was just too simple: it worked when I had simple one-step tasks like “buy trebuchet from Amazon” on the list, but complicated things like “build a trebuchet” would just sit on the list. It never felt like I was making forward progress on those large items, even when I worked for hours on it, and breaking up the task into parts felt like cheating (since you get rewarded for completing any one task[3]), but more importantly it made my todo list long, cluttered, and impossible to sort. Additionally, I wanted to put things onto the list that I wanted to do, but weren’t urgent, which would just compound how cluttered the list would be. For scale, I made a todo spreadsheet in college that accumulated 129 items, and most of which weren’t done by the end of college and would have taken weeks of work.

So I needed two things: a way to track all of the projects I wanted to do, even the stupid ones I wouldn’t end up doing for years, and a way to track projects while letting me break them down into manageable tasks.

After a brief stint of looking at existing todo apps, and even foraying into commercial project management tools, I decided I was a special unique flower and had to build my own task tracker, and started coding.

After weeks of this, one of my friends started raving about org-mode, the flexible list-making/organization system built inside of Emacs (the text editor; I talk about it some more below). He told me that I should stop re-implementing the wheel: since I was already using Emacs, why not just hack the fancy extra stuff I wanted from a todo system on top of org-mode, instead of tediously re-implementing all the simple stuff I was bogged down in? So I tried it, and it’s worked out in exactly that way. The basics are sane and easy to use, and since it’s just an Emacs package, I can configure and extend it however I want.

Like I implied earlier, I use my org-mode file as a place to toss all the things that I want to do, or have wanted to do; it’s my data pack-rat haven. For example, I have an item that tracks “make an animated feature length film”[4], which I’m pretty sure will never happen, but I keep it around anyways because the peace of mind I can purchase with a few bytes of hard drive space is an absolute bargain. It doesn’t matter that most of my tasks are marked “maybe start 10 years from now”, just that they’re on paper disk and out of my head.

And like I implied earlier, org-mode really got me to start breaking down tasks into smaller items. “Build a trebuchet” is a long task with an intimidating number of things to do hidden by a short goal statement; breaking it down into “acquire timber” and “acquire chainsaw” and “acquire boulders” is easier to think about, and makes it clearer how I’m making progress (or failing to do so).

The last big feature of org-mode that I use is time tracking, allowing me to track time to certain tasks. I do a weekly review, and org-mode lets me look at how I did certain tasks, and for how long. For example, I used to think that I wrote blog posts by doing continual short edit/revision cycles, but it turned out that I usually had the revision-level changes nailed down quickly, but then I had long editing cycles where I worried about all the minutia of my writing. Now I’m more realistic about how much time I spend writing, and how quickly I can actually write, instead of kidding myself that I’ll be happy with just an hour of editing[5].

Org-mode isn’t for everyone. It only really works on desktop OS’s (some mobile apps consume/edit the org-mode file format, but only somewhat), so it’s hard to use if you aren’t tied to a desktop/laptop. And the ability to extend it is tied up in knowing an arcane dialect of lisp and a willingness to wrestle with an old editor’s internals. And you might spend more time customizing the thing than actually getting things done. But, if you’re bound to a desktop anyways, and know lisp, and have the self discipline to not yak shave forever, then org-mode might work for you.

Inbox

Nothing out of the ordinary here, it’s just Google email. Aside from handling my email, I primarily use the reminders feature: if there are small recurring tasks (like “take vitamins”), then I just leave them in Inbox instead of working them into org-mode. At some point they’ll probably move into org-mode, but not yet.

Keep / Evernote

I started using Evernote from 2011 or so, and switched to Keep last year when Evernote tried to force everyone to pay for it. Originally, I bought into the marketing hype of Evernote circa 2011: “Remember Everything”. Use it as your external brain. Memorizing is for chumps, write it down instead.

And I took the “Everything” seriously. How much did I exercise today? What did I do this week? What was that interesting link about the ZFS scrub of death? Why did I decide to use an inverted transistor instead of an inverted zener diode in this circuit? It’s all a search away.

I recognize that this level of tracking is a bit weird, but recalling things with uncanny precision is helpful. For example, while I was doing NaNoWriMo in November, I had years of story ideas and quips as notes; if I sort of half-remembered that I had an idea where Groundhog Day was a desperate action movie instead of a comedy, I could just look up what sorts of plot directions I had been thinking about, or if I had more ideas about the plot over time, and bring to bear all that pent up creative energy.

Less importantly, I use my note taking stream as a mobile intake hopper for org-mode, since there aren’t any mobile org-mode apps I trust with my todo list.

Habit Group

And for something that isn’t electronic: I am part of a habit setting and tracking group. It’s a group of like-minded individuals that all want to be held accountable to their goals, so we get together and tell each other how we are doing while striving towards those goals. It’s using social pressure to get yourself to be the person you want to be, but without the rigid formality of tools like Stickk.


Mobile Apps

Anki

A spaced repetition app, free on Android. See Gwern for an introduction deep dive on spaced repetition.

I use it to remember pretty random things. There’s some language stuff, mainly useful for impressing my parents and niece with how easily I can pronounce Korean words. There’s some numbers of friends and family, in case I somehow lose my phone and find a functioning payphone. There’s a subset of the IPA alphabet, in case I need to argue about pronunciation.

I have some more plans to add to this, but mostly covering long-tail language scenarios. If you’ve read Gwern’s introduction above, you’ll remember that the research implies that mathematical and performance knowledge are not as effective to memorize through spaced repetition as language and motor skills, so I’m not really in a rush to throw everything into an Anki deck.

Google Authenticator

This is your reminder that if you’re not using two-factor authentication, you really should be. Two factor means needing two different types of things to log in: something you know (a password) and something you have (a phone, or other token). This way, if someone steals your password over the internet, you’re still safe if they also don’t mug you (applicable to most cybercriminals).

Password Manager

On a related note, if you aren’t using a password manager then you should be using one of those, too. The idea is to unlock your password manager with a single strong password, and the manager remembers your actual passwords for all your different accounts. Since you don’t have to remember your passwords, you can use a different strong random password for each different service, which is much more secure than using the same password for everything[6]. For a starting pointer, you can start with The Wirecutter’s best password manager recommendations[7].

Feedly

For reading RSS feeds. I follow some bloggers (SSC, Overcoming Bias), some science fiction authors (Stross, Watts), and the Tor.com short story feed.

However, Feedly isn’t especially good. The primary problem is the flaky offline support. Go into a tunnel? There’s no content cache, so you can’t read anything if you didn’t have the app open at the exact moment you went underground. (I imagine this is mostly a problem in NYC).

Plus, the screens are broken up into pages instead of being in one scrolling list, which is weird. It’s okay enough to get me to not leave, but I’m on the look out for a better RSS reader.

Swarm

Location check-in app, throwing it back to 2012. Sure, it’s yet another way to leak out information about myself, like whether I’m on vacation, but governments and ginormous companies already can track me, so it’s more a question of whether I want to track myself. Swarm lets me do that, and do it in a way that is semantically meaningful instead of just raw long/lat coordinates.

Kobo eReader

My trusty e-reader, which I’ve written about before. It currently runs stock firmware, but I recently learned about an exciting custom firmware I had missed, koreader, which looks like it solves some of the PDF problems I had bemoaned before. We’ll see if I can scrounge up some time to check it out.


Desktop Software

Emacs

Text editor Operating system. What org-mode is layered on top of. If you’re clicking around with a mouse to move to the beginning of a paragraph so you can edit there, instead of hitting a couple of keys, you’re doing it wrong.

Also make sure to map your caps lock key to be another control, which is easily one of the higher impact things on this list that you can do today, even if you will never use Emacs. Now, you don’t have to contort your hand to reach the control keys when you copy-paste, or when you issue a stream of Emacs commands.

Ubuntu

Running 16.04 LTS, with a ton of customization layered on top. For example, I replaced my window manager with…

xmonad

Tiling window manager for Linux. All programs on the desktop are fully visible, all the time. This would be a problem with the number of programs I usually have open, but xmonad also lets you have tons of virtual desktops you can switch between with 2 key-presses. I suspect that this sort of setup covers at least part of the productivity gains from using additional monitors.

Caveat for the unwary: like org-mode, xmonad is power user software, which you can spend endless time customizing to an inane degree (to be fair, it’s usually a smaller amount of endless time than org-mode).

Redshift

Late night blue light is less than ideal. Redshift is a way to shift your screen color away from being so glaringly blue on Linux.

There are similar programs for other platforms:

However, the default behavior for most of these apps is to follow the sun: when the sun sets, the screen turns red. During the winter the sun sets at some unreasonable hour when I still want to be wide awake, so there’s some hacking involved to get the programs to follow a time-based schedule instead of a natural light schedule.

Crackbook/News Feed Eradicator (Chrome extensions)

I’m sure you’re aware of how addictive the internet can be (relevant XKCD). These extensions help me make sure I don’t mindlessly wander into time sinks.

I use Crackbook by blocking the link aggregators I frequent, hiding the screen for 10 seconds: if there’s actual content I need to see, or if I’m deliberately relaxing, then 10 seconds isn’t too much time to spend staring at a blank screen. But if I just tabbed over without thinking, then those 10 seconds are enough for second thoughts, which is usually enough to make me realize that I’ve wandered over by habit instead of intention, and by that point I just close the tab.

The News Feed Eradicator is pretty straightforward: it just removes Facebook’s infinite feed, without forcing a more drastic action, like deleting your Facebook. For example, it’s easy for me to see if anyone had invited me to an event[8], but I don’t get sucked into scrolling down the feed forever and ever.

This will not work for everyone: some people will go to extreme lengths to get their fix, and extensions are easy to disable. However, it might work for you[9].


Things I Made To Help Myself

Newsletter Aggregator Tool

I made a personal tool to create the monthly/quinannual/annual newsletters I send to the world. It’s my hacked up replacement for social networking.

Throughout the month/year/life, I keep the tool up to date with what’s happening, and then at the end of the month it packages everything up and sends it in one email. It’s not strictly necessary, since I could just write out the email at the end of the month/year, but it feels like less of a time sink, since I’m spreading the writing out over time instead of spending a day writing up a newsletter, and that means I’m willing to spend more time on each entry.

Writing Checker Tool

There are a number of writing checkers out there: some of them aren’t even human.

There’s the set of scripts a professor wrote to replace himself as a PhD advisor. There are some folks that are working on a prose linter (proselint, appropriately), which aims to raise the alarms only when things are obviously wrong with your prose (“god, even a robot could tell you ‘synergy’ is bullshit corporate-speak!”). There have been other attempts, like Word’s early grammar checker, and the obvious spellchecker, but they all stem from trying to automate the first line of writing feedback.

My own script isn’t anything exciting, since it uses other scripts to do the heavy lifting, like the aforementioned proselint and PhD scripts. So far the biggest thing I added to the linter is a way to check markdown links for doubled parentheses, like [this link](https://en.wikipedia.org/wiki/Solaris_(2002_film)): unless the inner parentheses are escaped with \, the link won’t include the last ), probably preventing the link from working, and a dangling ) will appear after the link.

There are more things I plan on adding (proper hyphenation in particular is a problem I need to work on), but I’ve already used the basic script for almost every blog post I’ve written in 2016. Notably, it’s helping me break my reliance on the word “very” as a very boring intensifier, and helped me think closely about whether all the adverbs I strew around by default are really necessary.


Real Life

The 7 Minute Workout

Exercising is good for you, but it wasn’t clear to me how I should exercise. Do I go to the gym? That’s placing a pretty big barrier in front of me actually exercising, given that gyms are outside and gym culture is kind of foreign to me. Do I go running? It’s a bit hard to do so in the middle of the city, and I’ve heard it’s not good for the knees[10]. Places to swim are even harder to reach than gyms, so that’s right out.

What about calisthenics? Push ups, sit ups, squats and the like. It requires barely any equipment, which means I can do it in my room, whenever I wanted. While thinking about this approach, I came across the 7 minute workout as detailed by the NY Times. Is it optimal? Certainly not; it won’t build muscle mass quickly or burn the most calories[11]. Is it good enough, in the sense of “good is the enemy of perfect”? Probably! So I started doing the routine and have been doing it for 3.5 years.

I’ve made my own tweaks to the routine: I use reps instead of time, use dumbbells for some exercises, and swapped out some parts that weren’t working. For example, I didn’t own any chairs tall enough to do good tricep dips on, so I substituted it with overhead triceps extensions.

And, well, I haven’t died yet, so it’s working so far.

Cleaning Checklist

After reading The Checklist Manifesto, I only made one checklist (separate from my daily Habitica list, which I was already using), but I have been using that checklist on a weekly basis for more than a year.

It’s a cleaning checklist. I use it to keep track of when I should clean my apartment, and how: not every week is “vacuum the shelves” week, but every week is “take out the trash” week. It has been helpful for making sure I don’t allow my surroundings to descend into chaos, which was especially helpful when I lived alone.

Meditation and Gratitude Journaling

Meditation I touch on in an earlier blog post; it builds up your ability to stay calm and think, even when your instinct rages to respond. Gratitude journaling is the practice of writing down the things and people you are grateful for, which emphasizes to yourself that even when things are bad, there’s some good in your life.

I’m wary about whether either of these actually work, or are otherwise worth it, but lots of people claim they do, and to a certain extent, they feel like they do. In a perfect world I would have already run through a meta-analysis to convince myself, but I don’t know how to do that yet, so I just do both meditation and gratitude journaling; they’re low cost, so even if they turn out to not do anything it’s not too big a loss.

Book/Paper Lists

I keep spreadsheets with the books I am reading, have read, and want to read. I do the same with academic papers.

It’s not just “I read this, on this date”: I also keep track of whether I generally recommend them, and a short summary of what I thought of the book, which is helpful when people ask whether I recommend any books I read recently. On the flipside, I also use the list as a wishlist to make sure I always have something interesting to read.


That’s it for now! We’ll see how this list might change over the next while…


[1] And when I do make tools that make tools, should it be a broom or bucket?

[2] Obviously, this won’t work for everyone. If you’re not motivated by points and levels going upwards, but the general concept appeals to you, Beeminder might be more motivating, since it actually takes your money instead of imaginary internet points.

[3] Conceivably, you could make this work by creating tasks to take a certain amount of time (like 30 minutes) so each item is time based instead of result based, and treat that as Just The Way You Use The Habitica Todo List.

[4] Don’t worry, it’s more fleshed out than this: I’m not keen on doing something for the sake of doing something, like “write my magnum opus, doesn’t matter what it’s about”. Come on, it has to matter somehow!

[5] It’s certainly possible that I should try to edit faster, or move towards that short and repeated revise-edit cycle, but this is more about having a clear view of what I’m actually doing now, after which I can decide how I should change things.

[6] If you use the same password everywhere, then your password is only as secure as the least secure site you use. Suppose you use the same password at your bank and InternetPetsForum, and InternetPetsForum hasn’t updated their forum software in 12 years. If InternetPetsForum is hacked, and your password was stored without any obfuscation, the hackers are only a hop and skip away from logging into your bank account, too.

[7] I’m declining to state exactly which password manager to use; while security through obscurity isn’t really viable for larger targets, I’ve picked up enough residual paranoia that disclosing exactly which service/tool I use seems to needlessly throw away secrecy I don’t need to throw away.

[8] lol

[9] And if you want something that’s less easy to disable, then SelfControl or Freedom might be more your speed. I can’t personally vouch for either.

[10] Honestly not really a true objection, but saying “running is hard” makes me feel like a lazy bum. I already did 20 pushups, what more do you want?!

[11] If you are interested in optimality in exercise, I’ve heard good things about Starting Strength.

Transcript 7X-2: A Zoothropological Perspective

Thank you all for coming to today’s seminar on the Zootopia artifact, recovered during our excursion on planet 7X. I’ll be diving a bit deeper into the implications of the recording, especially those clues that might reveal the reason for their civilization’s demise.

You should have gotten a copy of the translated recording last night, but for those that skipped viewing it, the story matches our own “cop buddy” movies, with unlikely partners pairing up to right wrongs and become friends. Additionally, the recording conveys a message of tolerance, even to those highly unlike yourself.

However, there are hints throughout the recording that there exist multiple conflicts and instabilities brewing beneath the surface of society, any of which might have been the cause of the end of their civilization.

(Of course, everything must be taken with a grain of salt. I will interpret the recording in earnest, but the recording may be presenting a biased/utopic/dystopic view of Zootopian society. However, given the extreme degradation of the other artifacts recovered, I will simply have to assume that the recording reflects Zootopian reality.)

A Malthusian World

The most obvious problem is the looming Malthusian trap. We catch a glimpse of Bunny Burrow’s population near the beginning of the film, and can extrapolate an approximate growth rate. Pegging Bunny Burrow at the visible 8,143,580 individuals, and growing at 1-2 people per second, this rural farming town is almost as large as New York City, and growing around 2 to 4 times as quickly in terms of births assuming no deaths or immigration. Once we include deaths into the population counter, then the birthrate must be even larger.

Humanity dodged predictions of a Malthusian trap in the latter half of the 1900s with a green revolution and a novel tendency for rising standards of living to lead to lower birthrates. However, it’s not clear that either of these did or could happen on 7X. Bunny Burrow is 211 miles from Zootopia, a large and apparently wealthy city (for comparison, Boston is around 200 miles from NYC). Even though the town appears rural, the area is connected to the city with high speed rail, which practically puts Bunny Burrow right next to Zootopia. If Bunny Burrow is selling food to Zootopia, then Bunny Burrow almost certainly has a relatively high standard of living, and yet growth rates are still much greater than replacement. We don’t know what the bunny death rate is like, but unless there’s some system of bunny birth restriction just off screen, each couple giving birth to 275 children will not yield a low enough birthrate to avoid explosive population growth.

On the other hand, it is not apparent there has been a green revolution yet. There are 3 million farmers within the present-day USA, but 8 million farmers on the doorstep of Zootopia, which implies that Zootopian agriculture is closer to America in the 1870s, when half the country was involved with agriculture. It’s also implied that botany or science education is still in its infancy: no one seems to know about “Night Howlers”, an unrestricted plant that elicits an aggressive response in a wide range of species. If botany was underdeveloped relative to the rest of their apparent scientific advancement, then it is possible they could pull off their own green revolution and raise food yields and agricultural productivity. However, the apparent tendency of at least one species (sub-species?) to maintain birthrates in the face of prosperity simply means a massive yet finite increase in agricultural output would only forestall the inevitable.

To fully sketch a bleak world, once 7X nears carrying capacity, any change in agricultural productivity (say, a volcano dusting up the stratosphere) would cause famine. Human responses to famine are varied, so we can’t rule out responses such as violent revolution, widespread debt enslavement while people try to raise the funds to buy increasingly expensive food, or even simple mass death. It’s possible that any of these contributed to the ultimate desolation of Zootopia.

Divisions in Society

Contrary to the main message of the movie, another source of strife would be the highly heterogeneous nature of Zootopian society.

It is unclear how old the accord between herbivores and carnivores is; the introductory skit doesn’t elaborate beyond “thousands of years ago”, which is ambiguous. “There was war thousands of years ago” does not preclude “there was war tens of years ago”. There are hints, though, that the accord is a recent event.

By our eyes, Zootopia looks like a new city: high technology abounds, and there is not much creaking infrastructure of the sort you might find in an NYC subway. On the other hand, there are hints that the city is not brand new: the jungle superstructure presumably had to grow while the city provided climate control, and the city has been around for long enough that it has older low-cost housing (which Judy lives in). However, 50 years is more than long enough for a city to develop those sorts of signs of aging, and the overall veneer of the city reflects a shiny new Singapore instead of an older NYC or Paris. Since the accord was signed in Zootopia, the relative youth of the city implies that the accord is also young.

More circumstantial evidence suggests a young accord: predators easily enter a hyper-aggressive state with barely any chemicals applied (the skin is a good barrier against random chemicals entering the bloodstream). If the accord had happened thousands of years ago, one would expect predator aggression to be more easily kept in check, due to thousands of years of study on an important public relations matter.

(To be fair to the inhabitants 7X, it is likely that Zootopia exaggerates or imagines problems in society. In particular, the “Night Howlers” drug is curiously similar in nature to our own tales of zombies, serving as a fictional boogeyman. Along with the other problems I will detail about “Night Howlers” later, it seems unlikely that it is a real substance, or as dramatic as portrayed. However, as stated before, until a future expedition uncovers contrary evidence we can only take the recording’s word at face value.)

It seems that the accord is young. This means that the peace is more uncertain: institutions that have proven themselves over thousands of years in hundreds of civilizations, like the concept of courts, have shown themselves to be stable across many different circumstances. The accord seems more an uneasy peace that hasn’t had enough time to solidify into an alliance, more like the latest Israel-Palestine ceasefire than today’s peace between pre-Bismarck Germanic states. A societal shock might cause enough strife to break the accord, and the intervening peace would mean both predator and prey are prepared with better weapons.

(On the other hand, coordinating any peace at all between such different groups should be commended. Perhaps the inhabitants of Zootopia have a different enough neural architecture that negotiating and keeping a peace comes easy to them. However, the societal strife caused by Judy’s mid-recording revelations imply that isn’t the case.)

Subsistence Inequality

Another source of instability stems from the inequality coded into the genes of the different Zootopians, with vastly larger inherent differences between Zootopians than between any two humans.

Assuming that the city is relatively young, and that Zootopian society has only recently attained their technological level (much like our own world), it is only recently that smaller animals have gained access to machines with which they could do the work of much larger animals. Since they’re smaller, they don’t have large fixed costs: an elephant has to eat 300 pounds of food a day on an open savanna, while a gerbil has to eat 10 grams of food a day in a square foot cage. If the elephant wants to work at a high frequency trading firm downtown, he has to work remotely with the communication costs that entails, or pay out the trunk for a large city apartment that is still never large enough, but could serve as an outsized mansion for 20 gerbils.

If their society is still moving towards an information technology base, as it seems it is, (mobile phones included), then the smaller animals gain more and more of an advantage. And small animals are demonstrably not dumb: a shrew is a successful mafia don, and the employees at the financial institution Lemming Brothers are, well, lemmings. The situation is analogous to Robin Hanson’s virtual person emulation scenario, where the ease with which minds can be replicated and the low cost of virtual living drive wages through the floor, far below human subsistence costs (defined as maintaining the minimum caloric intake needed for living). Back on 7X, the low cost of gerbil living drives wages through the floor, far below elephant subsistence costs[1]. With this discrepancy in living costs, the tendency of smaller animals to have more children becomes more pronounced: it’s easy to support several deadbeat siblings as a gerbil, but a burden to support a deadbeat elephant. Even if the heritability of IQ doesn’t hold for the inhabitants of 7X, this means that it pays to pursue a r-selection strategy as a small animal. The more children you have, the more breadwinners you might have as children who can support all their siblings and then some. Over time, gerbils will vastly outnumber elephants.

In other words, tiny animals can eat the lunch of much larger animals. However, there is an existing peaceful integration of animals that literally eat each other: perhaps it is possible to also integrate animals with vastly different subsistence rates. One approach would be to impose a species-specific tax structure, similar to a skewed basic income, or provide a subsidy, like a housing subsidy for larger animals, or normalize different wages for different species[2] (it seems like these schemes aren’t already in place, since Judy doesn’t balk at paying for an elephant-marketed popsicle). Or coming at the problem from a different angle, perhaps their society would implement growth restrictions on faster growing populations, although it’s clear that such restrictions are not in place at the time of the recording.

Additionally, we do not know how long each animal species lives. If gerbils and elephants live as long as their terrestrial counterparts, then the shortness of gerbil lives leaves room for elephants to take on a long term Elder role, acting as a valuable repository of institutional knowledge for teams of short-lived gerbils. However, without more knowledge of Zootopian physiology, we can’t know for certain how their institutions would be structured to take advantage of different species, and if those would naturally counteract the problem of subsistence inequality or exacerbate them.

Balancing on a Knife Edge

In addition to the other concerns raised, it seems clear that there generally is a lot of destructive potential energy is stored in Zootopian society, but it is unclear how much of it is actively contained by their governments.

The first hint is the availability and easy-going concern with dangerous drugs, like “Night Howlers”. Previously, I pointed out that this meant that botany probably wasn’t advanced, but the advanced technology of the rest of 7X society means that oversights such as this are increasingly dangerous. Drawing a rough analogy, it’s as if knowing that ammonia fertilizers could be used to create explosives was freely available but specialized knowledge, and when a random farmer orders 10 tons of fertilizer over the internet and blows up an orphanage, the government blames the orphanage for being an old creaky building, and says so for months. “Night Howlers” have been an uncontrolled substance for so long, and city police so unconcerned with copycat terrorist attacks after the events in the Zootopia recording (mass aerosol or water supply attacks leap to mind) that 7X society seems woefully unprepared for what our colleagues in that Three Letter Agency[3] call “independent actors” leveraging all the power a technological society grants them, without any of the checks.

The second hint is the absolutely mind-boggling availability of energy. Creating city-sized micro-climates? It’s an HVAC nightmare, an energy black hole to shovel electrons into. How bad might it be? Let’s do a Fermi estimate: since the climate outside the city is reasonably temperate looking, we might estimate that it is similar in latitude to the farming zone in Western Europe, which means it gets around 50% less direct sunlight than the equator. If the desert climate requires HVAC to make up the rest of the energy usually injected into a more equatorial desert by the sun, then a Manhattan-sized area would require 120TWh of energy over a year [4]. Keep in mind that all 5 boroughs of New York City used around 60TWh in 2009: it requires a city-sized energy budget just to keep one of these climates stable. With 2 more climates to control, the energy expenditure must be staggering. There are some energy savings to be had by the fact that the cooling systems for Tundratown can just dump waste heat directly into Sahara Square, but we’re neglecting to account for the fact that none of these climates are enclosed. It’s well known that you should close your windows when your AC is running under pain of using and paying for more energy than necessary, and the same principle applies here: we never see an enclosing dome dividing the different climates, including the temperate climate in the surrounding area. It’s tough to say exactly how much heat leakage happens between each borough, but it’s likely that the already high energy expenditures become astronomical.

This loose attitude towards energy usage probably means that energy is dirt cheap. However, where is this energy coming from? There’s so much of it, there’s a distinct possibility it’s coming from somewhere unsafe. Certainly, the Zootopians may have access to liquid thorium reactors, fusion reactors, or more exotic forms of energy generation, but we don’t know that they did, and many of the high-output energy technologies we have access to have dubious trade offs.

Fossil fuel sources have the downside of undoing their careful climate control (but we do know their world ended, so maybe that played a role). Nuclear energy needs strict controls to ensure it doesn’t aid nuclear proliferation, and with their lax approach to “Night Howlers”, it isn’t out of the question that they would have problems down the line.

Even if it’s safer green energy, the amount of energy in play can still be dangerous. If there are multiple booming populations like Bunny Burrow, and agricultural efficiency isn’t advanced, the rest of the world likely favors farms, not solar panels. However, we know Zootopia had orbital launch capability, since children want to become astronauts when they grow up, which opens up solar energy farms in space. Getting the energy from vast regions of space, though, has some problems. If there’s an orbital laser beaming down energy from an orbital solar array, that’s another opportunity for something to be hacked and aligned with great destructive power. Same with using gravitational potential energy, such as using falling asteroids as an energy source. It’s not worth belaboring the destructive potentials of even higher density energy mediums, like antimatter.

Conclusions

From the moment we arrived on planet 7X, we knew that we arrived on a dead world. With the end of their journey fixed, we can only look to the past and ask who lived on planet 7X, how they lived, and what brought their civilization to a smoking ruin. We can only hope that by learning more about this one of many civilizations that was caught by the Great Filter, we can hope to avoid their fate.

That’s it. Thank you for coming. Now, are there any questions?


(And in case there is any doubt: this is not an allegory for the current human condition, or any portion of such. This is a crazy no-holds-barred extrapolation of a children’s movie.)


[1] In case you were wondering, humans aren’t subject to the same problem: on a logarithmic scale, there’s barely any difference in size between small and large humans, and size does not correlate to appetite.

[2] This last suggestion seems straightforward, but probably introduces more knock-on effects. For example, a law enforcing different wages per species likely makes their version of Mechanical Turk vulnerable to illegal competitors: if there are dark web enabling technologies like Tor and Bitcoin, then carrying out “human” intensive tasks will be much cheaper in the black market, since smaller animals could mask their identity and charge rates undercutting large animal rates, but higher than small animal rates.

[3] Hint: all three letters are different. Sorry if you thought I was referencing the FAA.

[4] A back of the blog calculation: sunlight provides 1120W/m2. Manhattan is 59.1km2 large. Assume 10 hours of sunlight a day and 365 days a year. Divide by half due to latitude. Arrive at around 120TWh/year.