Nathan Hwang

2017 Review

If there’s a theme for my 2017, it seems to be FAILURE.

FAILURE at cultivating habits

  • Due to the addition of a morning standup at work, I noticed I was getting in much later than I thought. I could previously pass off some pretty egregious arrival times as “a one time thing” to myself, but not when a hard deadline made it clear that this was happening multiple times a week. So I tried harder to get in earlier, and this made basically no impact.
  • I noticed I was spending a lot of time watching video game streaming; Twitch streams are long, regularly 4 hours, which would just vaporize an evening or an afternoon. It’s not so much that it was a ton of total time, but it was basically a monolithic chunk of time that wasn’t amenable to being split up to allow something to get done each day. I love you beaglerush, but the streams are just too damn long, so I decided I should stop watching game streams. However, I just felt tired and amenable to bending the rules at approximately the same rate as before, so my behavior didn’t really change.
  • I’m a night owl, to the extent that going to sleep between midnight and 3 AM probably covers 95% of my sleeping times, and the rest is heavily skewed towards after 3 AM. So I started tracking when I went to sleep, and had some friends apply social demerits when I went to sleep late. I got mildly better, but was still all over the place with my sleep schedule.

There’s a happy-ish ending for these habits, but first…

FAILURE at meeting goals

A year ago, I decided to have some resolutions. However, I didn’t want them to be year-long resolutions: a year is a long fucking time, and I knew I’d be pretty susceptible to…

  • falling off the wagon and then not getting back on, burning the rest of the year, or
  • mis-estimating how big a year-sized task would be, which would probably only become apparent near the middle of the year. If I got it really wrong, it would be months before I could properly try again.

So similarly to my newsletter tiers, I decided to break the year into fifths (quinters?), and resolved to do something for each of those. I hoped it would be long enough to actually get something done, while being short enough that I could iterate quickly.

So, how did I do?

Quinter 1

  • FAILURE. Finish the hardware part of project noisEE. Design turned out to be hard, did a design Hail Mary that required parts that didn’t get here before the end of the quinter.
  • Stretch FAILURE. Read all of Jaynes’ Probability Theory. Got only ~40% of the way through: it turns out trying to replicate all the proofs in a textbook is pretty hard.
  • FAILURE. Try to do more city exploratory activities. Planning and executing fun/interesting activities was more time consuming than anticipated, and I didn’t account for how much homebody inertia I harbored and how time consuming the other goals would be.
  • SUCCESS. Keep up all pre-existing habits.

Quinter 2

  • FAILURE. Finish project noisEE. It turns out the Hail Mary design was broken, who could have guessed?
  • SUCCESS (mostly). Make a NAS (network attached storage) box. Technically, the wrap up happened the day after the quinter ended.
  • SUCCESS. Keep up all pre-existing habits. Apparently attaining this goal isn’t a problem, so I stopped keeping track of this in future quinters.

Quinter 3

  • SUCCESS/FAILURE. Finish project noisEE, or know what went wrong while finishing. There was a problem with the 2nd Hail Mary, which I debugged and figured out, but it was expensive to fix, so I didn’t stretch to actually fix it. However, the next quinter I didn’t respect the timebox, which was the entire point of this timebox[1].
  • FAILURE. Make a feedback widget for meetups. After designing it, I discovered I didn’t want to spend the money to fabricate the feasible “worse is better” solution.
  • SUCCESS. Spend 20 hours on learning how to Live Forever. Spent 30+ hours on research.

Quinter 4

It’s about this time that I start enforcing goal ordering: instead of doing the easiest/most fun thing first, I would try to finish goals in order, so large and time consuming tasks don’t get pushed to the end of the quinter.

  • SUCCESS. Finish ingesting in-progress Live Forever research. Just wanted to make sure momentum continued from the previous quinter so I would actually finish covering all the main points I discovered I wanted to include.
  • SUCCESS (sad). Fix project noisEE, or give up after 4 hours. I gave up after 4 hours, after trying out some hacks.
  • SUCCESS. Write up noisEE project notes. Surprisingly, I had things to say despite not actually finishing the project, making the notes into a mistakes post.
  • FAILURE. Write up feedback widget design for others to follow. For some reason, I ignored my reluctance to actually build the thing and assumed I would value writing out potentially building the thing instead. Talk about a total loss of motivation.
  • SUCCESS. Write up the Live Forever research results, post about them. Includes practicing presenting the results a number of times.
  • Stretch FAILURE. Prep the meta-analysis checklist. Didn’t have time or the necessary knowledge.

Quinter 5

At this point, I’m starting to feel stretched out, so I started building in break times into my goal structure.

  • SUCCESS. Prepare to present the Live Forever research. Was probably too conservative here, I also planned to actually present, and there weren’t foreseeable things that would have prevented it from happening.
  • FAILURE. Take a week off project/goal work. I thought I would have only 1 week to prepare to present, but it turned into 2-3 weeks and broke up this break week, which was not nearly as satisfying.
  • SUCCESS. Redesign the U2F Zero to be more hobbyist friendly[2].
  • SUCCESS. Do regular Cloud™ backups[3][4].
  • SUCCESS. Take 1 week off at the end of the year. That’s when I’m writing this post!

Miscellaneous FAILURE

There’s so much FAILURE, I need a miscellaneous category.

Speaking of categories, I was organizing a category theory reading group for the first third of 2017 based on Bartosz’s lectures, but eventually the abstractions on abstractions got to be too much[5] and everything else in life piled on, and we ended up doing only sporadic meetups where we sat around confused before I decided to kill the project. In the end, we FAILED to reach functional programming enlightenment.

I’ve even started to FAIL at digesting lactose. It’s super sad, because I love cheese.

Why was there so much FAILURE this year?

Part of it is that I had more things to FAIL at. For example, I wouldn’t previously keep track of how I was doing at my habits, and color code them so I could just look at my tracker and say “huh, there’s more red than usual”. Or, I wouldn’t previously have the data to say “huh, I went to sleep after 3AM 2 weeks in a row”[6].

And in a way, I eventually succeeded: for each of the habits I listed earlier, I applied the club of Beeminder and hit myself until I started Doing The Thing. Does my reliance on an extrinsic tool like Beeminder constitute a moral failing? Maybe, but the end results are exactly what I wanted:

  • I got super motivated to build up a safety buffer to get into work early (even before getting my sleep schedule together!),
  • only broke Twitch abstinence twice since starting in May[7],
  • immediately went from an average sleeping time of 2AM to almost exactly 12:29[8].

And for goals, I opened myself up to FAILURE by actually making fine-grained goals, which meant estimating what I could do, and tracking whether I actually did them. In a way, there are two ways to FAIL: I could overestimate my abilities, or I could simply make mistakes and FAIL to finish what I otherwise would have been able to do. In practice, it seems like I tended to FAIL by overestimating myself.

It’s pretty obvious in retrospect: I started out by FAILING at everything, and then started cutting down my expectations and biting off smaller and smaller chunks until I actually hit my goals. Maybe I should have built up instead of cutting down, but I wanted to feel badass, and apparently the only way you can do that is by jumping in the deep end, so FAILING over and over it is. On the other hand, I think I just got lucky that I stuck it out until I got it together and started hitting my targets, so if you can do it by building upwards, that might work better.


So going forward what are the things I’d keep in mind when trying to hit goals?

  • Think through more details when planning. Saying “I will do all the proofs in Probability Theory” is fine and good, but there’s only so much time, and if you haven’t worked even one of the proofs, then it’s not a goal, it’s a hope and a prayer. Get some Fermi estimates in there, think about how long things will take and what could go wrong (looking at you, hardware turn-around times[9]).
  • If you’ve never done a similar thing before, then estimating the effort to hit a certain goal is going to be wildly uncertain. Pare the goal way down, because there are probably failure modes you’re not even aware of. For example, “lose 5 pounds” would be a good goal for me, because I’ve fiddled with the relevant knobs before and have an idea about what works. “Make a coat from scratch” is a black box to me, hence not a good goal. Instead, I might instead aim for “find all the tough parts of making a coat from scratch”, which is more specific, more amenable to different approaches, and doesn’t set up the expectation of some end product that is actually usable[10].
  • Relatedly, 10 weeks (about the length of a quinter) is not a leisurely long time. Things need to be small enough to actually fit, preferably small enough to do in a sprint near the end of the quinter. I know crunch time is a bad habit carried over from my academic years, but old habits die hard, and at least the things get done.
  • Build in some rest. I pulled some ludicrous hours in the beginning of the year, and noticed as time went on that I seemed less able to put in a solid 16 hours of math-working on the weekends. My current best guess is that I haven’t been taking off enough time from trying to Do The Thing, so I’m building in some break times.
  • Don’t throw away time. You’ll notice that I kept the noisEE dream alive for 4 quinters, each time trying tweaks and hacks to make it work. It’s clear now that this is a classic example of the sunk cost fallacy, and that I either should have spent more time at the beginning doing it right, or just letting it go at that point.

    Another way to throw away time is to try and do things you don’t want to do. My example is trying to make/post the feedback widget, which is pretty simple, but I discovered I couldn’t give any shits about it after the design phase. This isn’t great, because I said I wanted to do the thing, and not doing the thing means you’re breaking the habit of doing the things you’ve set out to do (from Superhuman by Habit). Unfortunately, I’m still not sure how to distinguish when you really want to do something versus when an easily overridden part of yourself thinks it’s virtuous to want to do something, which is much less motivating.

  • Goal hacks might be useful. Looking at it, the main hack I used was timeboxes, which worked sometimes (total longevity research was within a 2x order of magnitude of my timebox estimate) and not so well in others (noisEE overflowed). It seems to be most useful when I’m uncertain how much actual work needs to be done to achieve some goal, but I still want to make sure work happens on it. After working on it for some number of hours, it should be clearer how sizable the task is and it can get a more concrete milestone in the next round.

    Stretch goals might also work, but making things stretch goals seems like a symptom of unwanted uncertainty, and tend to be sized such that they never actually get hit. Unless I find myself stagnating, I plan on just dropping stretch goals as a tool.

  • If you’re not doing the thing because of something low-level like procrastination, a bigger stick to beat yourself with might help. Beeminder is my stick of choice, with the caveat that you need to be able to be honest with yourself, and excessive failure might just make you sad, instead of productive.

    (As a counterpoint, you might be interested in non-coercive ways to motivate yourself, in which case you might check out Malcolm Ocean’s blog.)

Despite all the FAILURE, I think agree with the sentiment of Ray’s post: over the past few years, I’ve started getting my shit together, building the ability to do things that are more complicated than a single-weekend project and the agency to pursue them.

That said, most of the things I finished this year are somewhat ancillary, laying the groundwork for future projects and figuring out what systems work for me. Now that I’ve finished a year testing those systems and have some experience using them, maybe next year I can go faster, better, stronger. Not harder, though, that’s how you burn out.

Well, here’s to 2018: maybe the stage I set this year will have a proper play in the next.

[1]  Thinking about it, timeboxes fall into two uses. You either want to make a daunting task more tractable, so you commit to only doing a small timebox, and if you want to keep going then that’s great! However, the other timebox is used to make sure that some task that would otherwise grow without bound stays bounded. I intended for the noisEE timebox to be used in the bound fashion, so when I kept deciding to keep working on it, that meant the timebox was broken.

[2]  This project does not have a post yet, and may never have one. Hold your horses.

[3]  Offsite backups are an important part of your digital hygiene, and the Butt is the perfect place to put your them.

[4]  If people really want it, I can post about my backup set up.

[5]  Don’t worry, it’s easy, an arrow is like a functor is like an abstract transformation!

[6]  Knowledge is power, France is bacon.

[7]  One of these wasn’t Twitch at all, but a gaming stream I accidentally stumbled across on YouTube, but that still counts.


[9]  Unless you’re willing to pay out the nose, getting boards on a slow boat from China takes a while.

[10]  The tradeoff is that the 2nd goal is more nebulous: how do you know that you’ve found all the tough parts of making a coat? Maybe timeboxes would help in this case.

Filed under: Uncategorized
No Comments »

Ain’t No Calvary Coming

Epistemic status[1]: preaching, basically. An apology, in both senses[2].

I know my mom reads my blog; hi, mom.

Mothers being mothers, I figure I owe her a sit-down answer to why I’m not Christian, and don’t expect to re-become Christian[3]. Now, I don’t expect to convince anyone, but maybe you, dear reader, will simply better understand.

Let’s start at the end.

Let’s start with the agony of hell, and the bliss of heaven. Sure, humans don’t understand infinities, don’t grasp the eye-watering vastness of forever nor the weight of a maximally good/bad time. Nevertheless, young me had an active imagination, so getting people out of the hell column and into the heaven column was obviously the most important thing, which made it surprising that my unbeliever friends were so unconcerned with the whole deal. I supposed that they already had a motivated answer in place: as heathens, they would be wallowing in unrepentant hedonism, and would go to great lengths to make sure they kept seeing a world free of a demanding and righteous God.

I knew the usual way to evangelize, but it depended to a frustrating degree on the person being evangelized to. It seemed unacceptable that some of my friends might go to hell just because their hearts were never in the right place. Well, what if I found a truly universal argument for my truly universal religion? The Lord surely wouldn’t begrudge guidance in my quest to find the unmistakable fingerprints of God (which were everywhere, so the exercise should be a cakewalk), and I would craft a marvelous set of arguments to save everyone.

Early on, I realized that the arguments I found persuasive wouldn’t be persuasive to the people I wanted to reach: if you assumed the Bible was a historical text you would end up saying “no way, Jesus did all these miracles, that’s amazing!”, but what if you didn’t trust the Bible? I would need to step outside of the assumption that God existed, and then see the way back. Was this dangerous to my faith? Well, I would never really leave: I would just be empathetic and step into my friend’s shoes, to better know how to guide them into the light. And you remember the story about walking with Jesus on the beach? There was no way this could go wrong!

Looking back, I see that my thoughts were self-serving. As a product of both faith and science, I wanted to make it clear that religion could meet science on its own terms and win. If the hierarchy of authority didn’t subordinate science to religion, then…?

So I studied apologetics[4], particularly Genesis apologetics. I made myself familiar with the things like young vs. old earth creationism, the tornado-in-a-junk-yard equivocation, attacks I could make on gradual and punctuated equilibrium[5]. I was even dazzled by canopy theory, where a high-altitude aerial ocean wrapped the planet, providing waters for The Flood and allowing really long lifespans by blocking harmful solar radiation[6]. I went on missions, raising money and overcoming my natural reticence to talk to people about the Good Word. I even listened almost solely to Christian rock music.

Now, I don’t doubt I believed: I felt the divine in retreats and mission trips, me and my brothers and sisters in Christ singing as one[7]. I prayed for guidance, hung on the words of holy scripture, found the words for leading a group prayer, and eventually confirmed my faith. As part of my confirmation, I remember being baptized for the 2nd time in high school[8]: a clear, lazy river had cut a gorge into sandstone, and the sunset lit the gorge with a warm glow. Moments before I went under the water, I thought “of course. How could I doubt with such beauty in front of me?”.

But some of these experiences also sowed the seeds of doubt. Someone asked if I wanted the blessing of tongues: I said yes, thinking a divine gift of speaking more than halting Spanish would be great for my upcoming mission trip. And, how cool would it be to have a real world miracle happen right in front of me‽ Later I tried to figure out if glossolalia was in fact the tongue of angels[9], but I didn’t come up with anything certain, which was worrying. Why were my local leaders enthusiastic about this “gift of tongues”[10], but other religious authorities were against the practice? On a mission trip I told someone I could stay on missions indefinitely (in classic high school fashion, I had read the word “indefinite” a few times and thought it sounded cool) and was brought up short when they responded with skepticism that someone could stay forever; why wouldn’t they stay if the work was righteous, comfortable living be damned? Or I would think about going to seminary instead of college, and wonder if that was God’s plan for me.

How did I know what was right, what was true?

The thing is that I didn’t even begin to know. On my quest for answers, I didn’t comprehend the sheer magnitude of 2000 years of religious commentary[11]. I didn’t grasp how hairy the family tree of Christian sects was, each with their own tweaks on salvation. I read Mere Christianity and a few books on apologetics, and thought it would be enough. I didn’t even understand my enemy at all, refusing to grapple with something so basically wrong as The Selfish Gene. Into this void on my map of knowledge I sailed a theological Columbus, expecting dragons where there was a whole continent of thought.

So the more I learned, the more doubt compounded. When my church split, I wondered why such a thing could happen: were some of the people simply wrong about a theological question? That raised more disturbing questions about how one could choose the truest sect of Protestant Christianity, ignoring “cults” like Mormonism or Catholicism or Eastern Orthodox or even other religions entirely, like Islam (and there are non-Abrahamic religions, too‽). Or, maybe a church split could happen for purely practical concerns, but it was disturbing that such an important event in a theological institution wasn’t grounded in theological conflict: if not a church split, then what should be determined by theology?[12] And, I realized other religions had followers with similarly intense experiences: what set mine apart from theirs?

Again, what did I know, and how did I know it?

Don’t worry, my spiritual leaders would say. God(ot) is coming, just wait here by this tree and he’ll be along any moment now[13].

And maybe God would come, but he would maintain plausible deniability, an undercover agent in his own church. Faith healings wouldn’t do something so visible as give back an arm, just chase away the back pain of a youth leader for a while. My church yelled prayers over a girl with a genetic defect, and the only outcome was frightening her[14]. Demonic possession leading to supernatural acts isn’t a recorded phenomenon, despite the proliferation of cameras everywhere. So the whispers of godhood would always scurry behind the veil of faith whenever a light of inquiry shone on it.

I started refusing to stand during praise. Singing with this pit of questions in my stomach seemed too much like betrayal, displaying to the world smiles and melodies I knew were empty. I sat and thought instead, trying to retrace Kant’s Critique of Pure Reason without Kant’s talent[15]. I simply couldn’t accept the dearth of convincing evidence and simply trust, when all my instincts and training screamed for a sure foundation, when I knew a cosmic math teacher would circle my answer of “yes, God exists!” and scribble in red “please show your work“.

I told myself I would end it in a blaze of glory, pledging fealty to a worthy Lord, or flinging obscenities at the sky and pulpit when they didn’t have the answers. Instead, my search for god outside of god himself petered out under a pile of unanswered questions[16], and I languished in a purgatory of uncertainty. In a way, I was mourning the death of god. It took years, but now I confidently say I’m an atheist.

So that was the past. What about the future?

Sometimes the prodigal son falls on hard times and has to come home; in the case of the church, home has a number of benefits. Peace of mind that everything will turn out okay. A sabbath, if one decides to keep it. A set of meditation-like practices at regular intervals (even in Christianity!). A set of high-trust social circles[17] with capped vitriol (in theory; in practice, see the Protestant Reformation and aforementioned church splits), a supportive community with a professional leader, a time to all feel together. Higher levels of conscientiousnessHigher productivity[18]. The ability to attract additional votes in Congressional races. Chips at the table of Pascal’s Wager[19].

Perhaps most importantly, though, is a sense of hope. How does one have hope for the future when there is only annihilation at the end?

Paul saw the end, a world descending into decadence, a world that couldn’t save itself: hell, given a map, it wouldn’t save itself. Contrary to this apocalyptic vision, scientism[20]/liberalism preaches abundance, the continual development of an ever better world. We took the limits of man and sundered them; we walked on the moon, we eradicated polio, we tricked rocks into thinking for us, and we’ll break more limits before we’re done. Paul was the product of an endless cycle of empires; we’re on a trajectory to leave the solar system[21].

There is light in the world[22], and it is us.

But if the world is simply getting better, then does it matter what I believe? Well, our rise is only part of the story: it took tremendous work to get from where we were to where we are, and the current world is built on the blood of our mistakes[23]. The double-edged sword of technology could easily lop off our hand if we’re not careful. We’ve done some terrible things already, and finding the Great Leap Forward-scale mistakes with our face is hideously expensive.

So progress is possible, but we haven’t won. How do the engineers say it? “Hope is not a strategy.” There ain’t no Calvary coming[24], ain’t no Good King to save us, ain’t no cosmic liquidation of the global consciousness, ain’t no millennium expiration date on suffering. A reductionist scientific world is a cold world without guardrails, with nothing to stop us from destroying ourselves[25]: if we want a happy ending, we’ll need to breach Heaven ourselves, and bowing our heads and closing our eyes in prayer won’t help when we should be watching the road ahead. It’s going to be a lot of hard work, but this isn’t a cause for despair. This is a call to arms.

So in the past, a successful prodigal son may have gone home for a sense of continuity and purpose, a sense of hope beyond the grave. However, now he doesn’t have to. It’s not just about unrepentant hedonism[26]: we’re getting closer to audacious goals like ending poverty, ending aging, ending death. We won’t wait for a bright new afterlife that isn’t coming: we humanists will do our best, and maybe, just maybe, it will be enough.

No heaven above, no hell below, just us. Let us begin.

[1]  Epistemics: the ability to know things. Epistemic status: how confident I am about the thing I am writing about.

[2]  Senses: saying sorry, and in the sense of apologetics or defending a position. Commonly found as the bi-gram “Christian apologetics”.

[3]  I almost didn’t publish this post, figuring I hadn’t heard from my mom about faith-related topics in a while. Then my mom told all my relatives “We are praying for a godly young woman who can bring <thenoviceoof> back to us”, so here we are.

[4]  A defense of the faith, basically, usually hanging around as a bi-gram like “Christian apologetics”. See Wikipedia.

[5]  Standing from where I am, I can see how the books would paint the strengths of science as weakness: “look at how science has been wrong! And then it changed it’s mind, like a shifty con-man!” In this respect, the flip-flopping nature of science journalism in fields like nutrition is Not Helping, a way of poisoning the well of confident proclamations of evidence, such that everyone defaults to throwing up their hands in the face of evidence, instead of actually assessing it.

[6]  In retrospect, I had a thing for weirdly implausible theories: I remember being smitten with the idea that all of physics could be explained by continually expanding subatomic particles, a sort of classical Theory of Everything that no one asked for, with at least one gaping hole you could drive trucks through (hint: how do satellites work?).

[7]  We even cautioned ourselves against “spiritual highs”. We would feel something, but the something wouldn’t always be there, which maybe should have tipped me off about something fishy happening. How do they say it, “don’t get high off your own supply”?

[8]  Many children are baptized soon after birth, and confirmed at some later age when they can actually make decisions. Hmm.

[9]  Now, I know that I could tell by listening for European capitals.

[10]  I didn’t actually get to the point of spewing glossolalia: I could hear my youth group leader’s disappointment that I didn’t quite let myself go while repeating “Jesus, I love you” faster than I could speak. And, finding out that no earthly audience would have understood what I was saying was also a shock, like finding out God solely communicated to people through grilled cheeses.

[11]  Talk about being bad at grasping infinities: I couldn’t even grasp 2000 years. “More things than are dreamt of in your philosophy”, etc.

[12]  The obvious rejoinder is that the church is still an earthly institution, and it’s still subject to mundane concerns like balancing the budget: for every Protestant Reformation grounded in theological conflict, there’s another hundred grounded in conflicts over the size of the choir, all because we live in a fallen world. The general counter-principle is that if there’s no way to tell from the behavior of churches whether we’re in a godly or godless world, then the fact there exists a church ceases to count as evidence.

[13]  The fact that some biblical scholars translate “cross” as “tree” makes me suspicious that Waiting for Godot was in fact making this exact reference.

[14]  I didn’t partake; this was after I started being weirded out by the charismatics.

[15]  I’m disappointed I didn’t throw up my hands at some point and yell “I Kant do it!”

[16]  Sure, there were answers, but they weren’t satisfying. You couldn’t get there from here.

[17]  Of course, the trust comes at a price; I wouldn’t want to be trans in a small tight-knit fundamentalist town.

[18]  It’s not clear from the abstract of the paper, but in Age of Em Robin Hanson cites this paper as showing the religious have higher productivity.

[19]  Mostly not serious, since I would expect a jealous Abrahamic God to throw out any spiritual bookies. Also keep in mind that Pascal’s wager falls apart even with the simple addition of multiple gods competing for faith.

[20]  I am totally aware that scientism is normally derogatory. However, science itself doesn’t require the modes of thought that we normally attribute to our current scientific culture.

[21]  One might worry that we would simply export our age-old conflicts and flaws to the stars, in which case they might become… bear with me… the Sins of a Solar Empire?

[22]  “Run for the mountains!” said Apostle Paul. “It is the dawn of the morning Son!” Then Oppenheimer said “someone said they were looking for a dawn?”

[23]  Sapiens notes “Haber won the Nobel prize in chemistry. Not Peace.”

[24]  I’m sorry-not-sorry about the pun. If you don’t get it, Calvary is the hill Jesus supposedly died on, and “ain’t no cavalry coming” is a military saying: there’s no backup riding in to save the day.

[25]  Nukes are traditional, if less concerning these days. Pandemics are flirting on the edge of global consciousness, AI getting more serious, and meta-things like throwing away our values and producing a “Disneyland without children” are becoming more concerning.

[26]  Just look at what the effective altruists are doing with their 10%.

Filed under: Uncategorized
No Comments »

The Mundane Science of Living Forever

Epistemic Status: timeboxed research, treat as a stepping stone to more comprehensive beliefs. Known uncertainty called out.

Live forever, or die trying!

Previously: Lifestyle interventions to increase longevity @ LessWrong9s of cats.


Yes, Immortality

I wrestled with whether to shoot for a more normal and mundane title, like “In Pursuit of longevity”, but “live a long time!” just doesn’t have the ring that “live forever!” does.

Clarification: I don’t have the Fountain of Youth. I’m relying on the future to do the heavy lifting. Kurzweil’s escape velocity idea is the key idea: we want to live long enough that life expectancy starts increasing more than 1 year per year. Life expectancy is currently stagnant, so we want to live as long as possible to maximize our chances of hitting some sort of transition.

In other words, we need silver bullets to overcome the Gompertz curve, but there are no silver bullets yet, just boring old lead bullets. We’ll have to make our own silver bullet factory, and use the lead bullets to get there.

So, the bulk of this post will be devoted to simply living healthily. A lot of the advice is boring and standard: eat your vegetables, exercise, get enough sleep. However, I wanted to check out the science and see what holds up under (admittedly amateur) scrutiny.

(I’ll be ignoring the painfully obvious things, like not smoking. If you’re smoking, stop smoking[1].)

My process: I timeboxed myself to 20 hours of research, ending in August 2017. First, I looked up the common causes of death and free-form generated possible interventions. Then, I followed the citations in the Lifestyle interventions to increase longevity post and then searched Google Scholar, especially for meta-analyses, and read the studies, evaluating them in a non-rigorous way. I discarded interventions that I wasn’t certain about: for example, Sarah lists some promising drugs and gene therapies but based only on animal studies, where I wanted more certainty. I ended up using 30+ hours, so not everything is exhaustively researched as much as I would like: for example, there was a fair amount of abstract skimming. I did not read every paper I reference end-to-end. On the other hand, many papers were also locked behind paywalls so I couldn’t do much more than that.

This means if you read one of these results and implement it without talking to your doctor about it and bad things happen to you, I will ask you: ARE YOU A SPRING LAMB? WHY THE FUCK ARE YOU DOING THINGS A RANDOM PERSON ON THE INTERNET TOLD YOU TO DO? AND WITHOUT VETTING THOSE THINGS?

Or more concretely: you are a unique butterfly, and no one cares except the medical world. What happens for the faceless statistical masses might not happen for you. I will not cover every single possible interaction and caveat, because that is what those huge medical diagnosis books are for, and I don’t have the knowledge to tell you about the contents of those books. Don’t hurt yourself, ask your doctor.

An example: blood donation

First, I wanted to lead with an example of how the wrong methods can cripple a conclusion and end up with bad results.

Now, blood donation looks like it is very, very good for male health outcomes. From “Blood donation and blood donor mortality after adjustment for a healthy donor effect.” with 1,182,495 participants (N=1,182,495) published in 2015 (note it’s just an abstract, but the abstract has the data we want):

» For each additional annual blood donation, the all-cause mortality RR (relative risk) is 0.925, with a 95% CI (confidence interval) from 0.906 to 0.943[2]. I’ll be summarizing this information as RR = 0.925[0.906, 0.943] throughout the post.

(Unless otherwise stated, in this post an RR measure will refer to all-cause mortality, and X[Y, Z] CI reports will be values followed by 95% confidence intervals. There will also be references to OR (odds ratio) and HR (hazard ratio)).

There’s even a well fleshed out mechanism, where iron ends up oxidizing parts of the cardiovascular system and damaging it, and hence doing regular blood donation removes excess blood iron.

But there are some possible confounders:

  • blood donation carries some of the most stringent health screens most people face, which results in a healthy donor effect,
  • altruism could be correlated with conscientiousness, which might affect health outcomes.

The study cited earlier is observational: they’re looking at existing data gathered in the course of normal donation and studying it to see if there’s an effect. In order to make a blanket recommendation that men should donate blood at some regular interval, what we really want is to isolate the effect of donation by putting people through the normal intake and screening process, and then right before putting the needle in randomize for actually taking the donation or not, or even stick the needle in and not actually draw blood.

(Note that randomization is not strictly better than observational studies: observations can provide insights that randomization would miss[3], and a rigorous RCT might not match real world implementations. Nevertheless, most of the time I want a randomized trial.)

No one had done an RCT (randomized controlled trial) in this fashion, and I expect any such study to have a really hard time passing an ethics board when I get numerous calls to help alleviate emergency blood need at a number of times throughout the year.

However, Quebec noticed that their screening procedures were too strict: a large group of people were being rejected when they were in fact perfectly healthy. The rejection trigger didn’t appear to otherwise correlate with health, so this was about as good a randomized experiment as we were going to get. Their results were reported in “Iron and cardiac ischemia: a natural, quasi-random experiment comparing eligible with disqualified blood donors” (2013, N=63,246):

» Donors vs nondonors, RR = 1.02[0.92, 1.13]

In other words, there was basically no correlation. In fact, in another section of the paper the authors could get the correlation to come back by slicing their data in a way that better matched the healthy donor process.

The usual hallmarks of science laypeople can pick apart aren’t there: the N is large, there’s a large cross-section of the community (no elderly Hispanic women effect) and there’s no way to even fudge our interpretation of the numbers: we’re not beholden to science’s fetish with p=0.05, so failing the 95% CI could be okay if it were definitely leaning in the right direction. But it’s almost exactly in the middle. The effect isn’t there or is so tiny that it’s not worth considering.

So that’s an example of how things can look like great interventions, and then turn out to have basically no effect. With our skeptic hats firmly in place, let’s dive into the rest!

Easy, Effective

Vitamin D

Vitamin D gets the stamp of approval from both Cochrane and Gwern[4]. Lots of big randomized studies have been done with vitamin D supplementation, so the effect size is pretty pinned down.

From “Vitamin D supplementation for prevention of mortality in adults” (2012, N=95,286, Cochrane):

» Supplementation with vitamin D vs none, RR = 0.94[0.91, 0.98]

Another meta-analysis used by Gwern, “Vitamin D with calcium reduces mortality: patient level pooled analysis of 70,528 patients from eight major vitamin D trials” (2012, N=70,528):

» Supplementation with vitamin D vs none, HR = 0.93[0.88, 0.99]

You might think that one side of the CI is pretty bad, since RR = 0.98 means the intervention is almost the same as the control. On the other hand, (1) wait until you read the rest of the post (2) keep in mind that it’s very cheap to supplement vitamin D. Your local drugstore probably has a years worth for $20. In a pinch, more sunlight also works, but if you have darker skin, sunlight is less effective.

If you’re interested, there’s lots of hypothesizing on the mechanisms by which more vitamin D impacts things like cardiovascular health (overview).

(If you want a striking visual example of vitamin D precursors correlating with cancer, there’s a noticable geographic gradient in certain cancer deaths; “An estimate of premature cancer mortality in the U.S. due to inadequate doses of solar ultraviolet-B radiation” (2002) states that some cancers are twice as prevalent in the northern US than the southern. There’s more sun in the south, and sunlight helps synthesize vitamin D. Coincidence?! If you want to, you can see this effect yourself by going to the Cancer Mortality Maps viewer from the National Cancer Institute and taking a look at the bladder, breast, corpus uteri or rectum cancers.)

Difficult, but Effective


Exercising is hard work, but it pays off big.

From “Domains of physical activity and all-cause mortality: systematic review and dose–response meta-analysis of cohort studies” (2011, N=unknown subset of 1,338,143[5]):

» Comparing people that get 300 minutes of moderate-vigorous exercise/week vs sedentary populations, RR = 0.74[0.65, 0.85]

Unfortunately, “moderate-vigorous” is pretty vague, and the number of multiple comparisons being made is breathtaking.

MET-h is a unit of energy expenditure roughly equivalent to sitting and doing nothing for an hour. Converting different exercises (or intensities of exercise) to MET-h measures can allow directly comparing/aggregating different exercise data. This also makes it easier to decide exactly what “moderate-vigorous” exercise is, roughly mapping to less than 3 MET/h for light, 3-6 for moderate, and above 6 for vigorous.

With this in mind, we can get a regression seeing how additional MET-hs impact RR. From the previous study (2011, N=unknown subset of 844,026):

» +4 MET-h/day, RR = 0.90[0.87, 0.92] (roughly mapping to 1h of moderate exercise)

» +7 MET-h/day, RR = 0.83[0.79, 0.87] (roughly mapping to 1h vigorous exercise)

There’s a limit, though: exercising for too long, or too hard, will eventually stop providing returns. The same study places the upper limit at around a maximum RR = 0.65 when comparing the highest and lowest activity levels. The Mayo Clinic in “Exercising for Health and Longevity vs Peak Performance: Different Regimens for Different Goals” recommends capping vigorous exercise at 5 hours/week for longevity.

A quick rule of thumb is that each hour of exercise can return 7x time dividends (news article). This sounds great, but do some math: put this return together with the 5 hours/week limit, assume that you’re 20yo and doing the maximum exercise you can until 60, and this works out to adding roughly 8 years to your life (note that the study the rule of thumb is based on (2012) gives a slightly lower average maximum gain, around 7 years). Remember the Gompertz curve? We can huff and puff to get great RRs, and it only helps a bit. Unfortunate.

(While we’re exercising: keep in mind that losing weight isn’t always good: if you’re already at a health weight and start losing weight without intending to, that could be a sign that you’re sick and don’t know it yet (source).)

Other studies I looked at:

Unfortunately, most of these studies are based on surveys, which have the usual problems with self reports. There are some studies based on measuring VO2max more rigorously as a proxy for fitness, except those have tiny Ns, in the tens if they’re lucky (it’s expensive to measure VO2max!).


Overall, many of these studies are observational and based on self-reports; a few are based on randomized provided food, but the economics dictate they have smaller Ns. I’ve put all the diet-related things together, since in aggregate they are fairly impactful (if difficult to put into practice), but note that some of the subheadings contain less certain results.

Fruit and vegetables

It’s like your childhood authority figures said: eat your vegetables.

From “Fruit and vegetable consumption and mortality from all causes, cardiovascular disease, and cancer: systematic review and dose-response meta-analysis of prospective cohort studies” (2014, N=833,234):

» +1 serving fruit or vegetable/day, HR = 0.95[0.92, 0.98]

Like exercise, fruits/vegetables don’t stack forever either; there’s around a 5 serving/day limit after which effects level off. Still, that adds up to around HR = 0.75, competitive with maximally effective exercise.

Potatoes are a notable exception, having a uniquely high glycemic load among vegetables; this roughly means that your blood sugar will spike after eating potatoes, which seems bad. You can find plenty of debate about whether this is in fact bad[6].

Other reports I looked at:

Red/Processed Meat

You know bacon is bad for you, but… bacon is pretty bad for you.

From “Red Meat and Processed Meat Consumption and All-Cause Mortality: A Meta-Analysis” (2013, N=unknown subset of 1,330,352) effects from both plain red meat (hamburger, steak) and processed red meat (dried, smoked, bacon):

» Highest vs lowest consumption categories[7] for red meat, RR = 1.10[0.98, 1.22]

» Highest vs lowest consumption categories for processed red meat, RR = 1.23[1.17, 1.28]

There isn’t all-cause data I could find on fried foods specifically, but “Intake of fried meat and risk of cancer: A follow-up study in Finland” specifically covers cancer risks (1994, N=9,990):

» Highest vs lowest tetrile fried meat: RR = 1.77[1.11, 2.84]

Note that the confidence intervals are wide: for example, the red meat CI covers 1.0, which is pretty poor (and yet the best all-cause data I could find). If we were strictly following NHST (null hypothesis significance testing), we’d reject this conclusion. However, I’ll begrudgingly accept waggled eyebrows and “trending towards significance”[8].

If you’re paleo, you might not have cause to worry, since you’re probably eating better than most other red meat eaters, but I have no data for your specific situation.

Other reports I looked at:

Fish (+Fish oil)

Fish is pretty good for you! Fish oil might contribute to fish “consumption”.

“Risks and benefits of omega 3 fats for mortality, cardiovascular disease, and cancer: systematic review” (2006, N=unknown subset of 36,913) looked at both fish consumption and fish oil, finding that fish/fish oil weren’t significantly different:

» High omega-3 (both advice to eat more fish, and supplementation) vs low, RR = 0.87[0.73, 1.03]

Note this analysis only included RCTs.

“Association Between Omega-3 Fatty Acid Supplementation and Risk of Major Cardiovascular Disease Events: A Systematic Review and Meta-analysis” (2012, N=68,680) looked only at fish oil supplementation:

» Omega-3 supplementation vs none, RR = 0.96[0.91, 1.02]

Note that both of these results have relatively wide CI covering 1.0. Additionally, the two studies seem to differ on the relative effectiveness of fish oil.

There’s plenty of exposition on mechanisms for why fish oil (omega-3 oil) might help in the AHA scientific statement “Fish Consumption, Fish Oil, Omega-3 Fatty Acids, and Cardiovascular Disease”.

Also make sure that you’re not eating mercury laden fish while you’re at it; just because Newton did it doesn’t mean you should.

Other studies I looked at:


This study of 7th Day Adventists by “Nut consumption, vegetarian diets, ischemic heart disease risk, and all-cause mortality: evidence from epidemiologic studies” points in the right direction (1999, N=34,198):

» Eating nuts <1 time/week vs >=5 times/week, fatal heart attack RR ~ 0.5[0.32, 0.75] (estimated from a graph)

However, I don’t trust it. Look at how implausibly low that RR is: eating nuts is better than getting the maximum benefit from exercise? How in the world would that work? Unfortunately, I wasn’t able to find any studies that weren’t confounded by religion, so I just have to stay uncertain for now.


We spend a third of our lives asleep, of course it matters. The easiest thing to measure about sleep is the length, so plenty of studies have been done on that. You want to hit a Goldilocks zone of sleep length, not too short or not too long. The literature calls this the aptly named U-shape response.

What’s too short, or too long? It’s frustrating, because one study’s “too long” can be another study’s “too short”, and vice versa.

However, from “Sleep Duration and All-Cause Mortality: A Systematic Review and Meta-Analysis of Prospective Studies” (2010, N=1,382,999):

» Too short (<4-7h), RR = 1.12[1.06, 1.18]

» Too long (>8-12), RR = 1.30[1.22, 1.38]

And from “Sleep duration and mortality: a systematic review and meta-analysis” (2009, N=unknown):

» Too short (<7h), RR = 1.10[1.06, 1.15]

» Too long (>9h), RR = 1.23[1.17, 1.30]

So there’s range right around 8 hours that most studies can agree is good.

You might be fine outside of the Goldilocks zone, but if you haven’t made special efforts to get into the zone, you might want to try and get into that 7-9h zone the studies can generally agree on.

Again, most of these studies are survey based. I can’t find the source, but a possible unique confounder is that sleeping unusually long might be a dependent, not independent variable: if you’re sick but don’t know it, one symptom could manifest as sleeping more.

And, if you get enough sleep but feel groggy, you might want to get checked out for sleep apnea.

Other studies I looked at:

Less Effective


The original longevity guide was enthusiastic about flossing. Looking at “Dental Health Behaviors, Dentition, and Mortality in the Elderly: The Leisure World Cohort Study” (2011, N=690), it’s hard not to be:

» Among daily brushers, never vs everyday flossers, HR = 1.25[1.06, 1.48]

Even more exciting is the dental visit results (N=861):

» Dental exam twice/year vs none, HR = 1.48[1.23, 1.79]

However, the study primarily covers the elderly with an average age of 81yo. Sure, one hopes that the effects are universal, but the non-representative population makes it hard to do so. So while flossing looks good, I’m not ready to trust one study, especially when I can’t find a reasonable meta-analysis covering more than a few hundred people.

As a counterpoint, Cochrane looked at flossing specifically in “Flossing to reduce gum disease and tooth decay” (2011, N=1083), finding that there’s weak evidence for reduction in plaque, but basically nothing else.

I’ll keep flossing, but I’m not confident about the impact of doing so.

Other studies I looked at:


Sitting down all day might-maybe-possibly be bad for health outcomes.

There are some studies trying to measure the impact of sitting length. From “Daily Sitting Time and All-Cause Mortality: A Meta-Analysis” (2013, N=595,086):

» +1 hour sitting with >7 hours/day sitting, HR = 1.05[1.02, 1.08]

However, the aptly named “Does physical activity attenuate, or even eliminate, the detrimental association of sitting time with mortality? A harmonised meta-analysis of data from more than 1 million men and women” (2016, N=1,005,791, no full text) claims the correlation only holds at low levels of activity: once people start getting close to the exercise limit, this study found the correlation between sitting and all-cause mortality disappeared.

From “Leisure Time Spent Sitting in Relation to Total Mortality in a Prospective Cohort of US Adults” (2010, N=53,440):

» Sitting >6 hours vs <3 hours/day (leisure time), RR 1.17[1.11, 1.24]

Note that this is the effect for men: the effect for women is larger. Also, this study directly contradicts the other study, claiming that sitting time has an effect on mortality regardless of activity level. And who in the world sits for less than 3 hours/day during their leisure time? Do they just not have leisure time?

Again, these studies were survey based.

The big unanswered question in my mind is whether exercising vigorously will just wipe the need to not sit. So, I’m not super confident you should get a fancy sit-stand desk.

(However, I do know that writing this post meant so much sitting that my butt started to hurt, so even if it’s not for longevity reasons I’m seriously considering it.)

Other reports I looked at:

Air quality

Air quality has a surprisingly small impact on all-cause mortality.

From “Meta-Analysis of Time-Series Studies of Air Pollution and Mortality: Effects of Gases and Particles and the Influence of Cause of Death, Age, and Season” (2011, N=unknown (but aggregated from 109 studies(?!))):

+31.3 μg/m3 PM10RR = 1.02[1.015, 1.024]

+1.1 ppm CO, RR = 1.017[1.012, 1.022]

+24.0 ppb NO2RR = 1.028[1.021, 1.035]

+31.2 ppb O3 daily max, RR = 1.016[1.011, 1.020]

+9.4 ppb SO2RR = 1.009[1.007, 1.012]

(I’m deriving the RR from percentage change in mortality.)

By itself the RR increments aren’t overwhelming. But since it’s expressed as an increment, if there are 50 increments present in a normal day that we can filter out ourselves, then that adds up to some real impact. The increments aren’t tiny compared to absolute values, though. For example, maximum values in NYC during the 2016 summer:

PM10 ~ 58 μg/m3

CO ~ 1.86 ppm

NO2 ~ 60.1 ppb

O3 ~ 86 ppb

SO2 ~ 7.3 ppb

So the difference between a heavily trafficked metro area and a clean room is maybe twice the percentage impacts we’ve seen, which just doesn’t add up to very much. Beijing is another story, but even then I (baselessly) question the ability of household filtration systems to make a sizable dent in interior air quality.

There are plenty of possible confounders: it seems the way these sorts of studies are run is by looking at city-level pollution and mortality data, and running the regressions on those data points.

Other studies I looked at:

Hospital Choice

Going to the hospital isn’t great: medical professionals do the best they can, but they’re still human and can still screw up. It’s just that the stakes are really high. Like, people recommend marking on yourself which side a pending operation should be done on, to reduce chances of catastrophic error.

Quantitatively, “A New, Evidence-based Estimate of Patient Harms Associated with Hospital Care” (2013) says that 1% of deaths in the hospital are adverse deaths deaths. However, note that many adverse deaths weren’t plausibly preventable by anyone other than Omega.

If you’re having a high stakes operation done, “Operative Mortality and Procedure Volume as Predictors of Subsequent Hospital Performance” (2006) recommends taking into account a hospital’s historical morbidity rate and volume for that procedure: if you’re getting heart surgery, you want to go to the hospital that handles lots of heart surgeries, and has done so successfully in the past.

Other studies I looked at:

Green tea

Unfortunately, there’s no all-cause mortality data on the impact of tea in general, green tea in particular. We might expect it to have an effect through flavonoids.

As a proxy, though, we can look at blood pressure, where lower blood pressure is better. From “Green and black tea for the primary prevention of cardiovascular disease” (2013, N=821):

» Systolic blood pressure, -3.18[-5.25, -1.11] mmHg

» Diastolic blood pressure, -3.42[-4.54, -2.30] mmHg

There’s a smaller effect from black tea, around half the size.

Cochrane also looked at green tea prevention rates for different cancers. From “Green tea for the prevention of cancer” (2009, N=1.6 million), it’s unclear whether there’s any strong evidence of effect for any cancer, in addition to there being a possible garden of forking paths.

If you’re already drinking tea, like me, then switching to green tea is low cost despite any questions about efficacy.

Borderline efficacy

Baby Aspirin

The practice of taking tiny daily doses of aspirin, mainly to combat cardiovascular disease. From “Low-dose aspirin for primary prevention of cardiovascular events in Japanese patients 60 years or older with atherosclerotic risk factors: a randomized clinical trial.” (2014, N=14,464):

» Aspirin vs none, aggregate cardiovascular mortality HR = 0.94[0.77, 1.15]

That CI width is very concerning; you can cut the data so you get subsets of cardiovascular mortality to become significant, like looking at only non-fatal heart attacks, but it’s not like there’s a breath of correcting for multiple comparisons anywhere, and the study was stopped early due to “likely futility”.

The side effects of baby aspirin are also concerning. Internal bleeding is possible (Mayo clinic article), since aspirin is acting as a blood thinner; however, it isn’t too terrible, since it’s only a 0.13% increase in “serious bleeding” that resulted in hospitalization (from “Systematic Review and Meta-analysis of Adverse Events of Low-dose Aspirin and Clopidogrel in Randomized Controlled Trials” (2006)).

More concerning is the stopping effect. “Low-dose aspirin for secondary cardiovascular prevention – cardiovascular risks after its perioperative withdrawal versus bleeding risks with its continuation – review and meta-analysis” looked into cardiovascular risks when stopping a baby aspirin regime before surgery (because of increased internal bleeding risks), and found that a low single-digit percentage of heart attacks happened shortly after aspirin discontinuation. (I’m having trouble interpreting this report.)

I imagine this is why professionals start recommending baby aspirin to folks above 50yo, since the risks of heart attack start to obviously outweigh the costs of taking aspirin constantly. And speaking of cost: baby aspirin is monetarily inexpensive.

Other studies I looked at:

Meal Frequency

Some people recommend eating smaller meals more frequently, particularly to lose weight, which is tied to health outcomes.

From “Effects of meal frequency on weight loss and body composition: a meta-analysis” (2015, N=unknown):

» +1 meal/day, -0.27 ± 0.11 kg of fat mass

It’s not really an overwhelming result; taking into account the logistical overhead of planning out extra meals in a society based on 3 square meals a day, is it really worth it to lose maybe half a kilogram of fat?

Other studies I looked at:

Caloric Restriction

Most longevity folks are really on board the caloric restriction (CR) train. There’s an appealing mechanism where lower metabolic rates produce fewer free radicals to damage cellular machinery, and it’s the exact amount of effort that one might expect from a longevity intervention that actually works.

A common example of CR is the Japanese Ryukyu islands, where there are a surprising number of really old people, who eat a surprisingly low number of calories. However, say it with me: con-found-ed to he-ll! The fact that a single isolated subsection of a single ethnic group have a correlation between CR and longevity doesn’t make me confident that I too can practice CR and tell death to fuck off for a few more years.

So we want studies. Unfortunately, most humans fall into the state of starving and lacking essential nutrients, or having enough calories and nutrients, but almost never the middle ground of having too few calories but all the essential nutrients (2003, literature review). Then there’s the ethics of getting humans to agree to a really long study that controls their diet, so let’s look at animal studies first.

However, different rhesus monkey studies give different answers.

» From “Impact of caloric restriction on health and survival in rhesus monkeys from the NIA study” (2012, N=unknown, no full text), there was no longevity increase from young or old rhesus monkeys.

» However, from “Caloric restriction delays disease onset and mortality in rhesus monkeys” (2009, N=76), there was a 30% reduction in death over 20 years.

Thankfully they’re both randomized, but it doesn’t really help when they end up with conflicting conclusions. You’d hope there would be better support even in animal models for something that should have huge impacts.

What else could we look at? We’re not going to wait for an 80-year human study to finish (the ongoing CALERIE study comes close), so maybe we could look at intermediate markers that are known to have an impact on longevity and go from there.

A CALERIE checkpoint study, “A 2-Year Randomized Controlled Trial of Human Caloric Restriction: Feasibility and Effects on Predictors of Health Span and Longevity” (2015, N=218), looks at the impact of 25% CR on blood pressure:

» Mean blood pressure change, around -3 mmHg (read from a chart)

Pretty good, but that’s also around the impact of green tea. Then, there’s the implied garden of forking paths bringing in multiple comparisons, since the study in the same cluster looks at multiple types of cholesterol and insulin resistance markers.

Finally, there’s the costs: you have to exert plenty of willpower to actually accomplish CR. For something with such large costs, the evidence base just isn’t there.


Chocolate has some impact on blood pressure. “Effect of cocoa on blood pressure” (2017, N=1804, Cochrane) finds that eating chocolate lowers your blood pressure:

Systolic blood pressure, -1.76[-3.09, -0.43] mmHg

Diastolic blood pressure, -1.76[-2.57,-0.94] mmHg

However, if you’re normotensive then there’s no impact on blood pressure, and only taking into account hypertensives the effect jumps to -4 mmHg. Feel free to keep eating your chocolate, but don’t expect miracles.

Social Interaction

Having a social life looks like a really great intervention.

From “Social Relationships and Mortality Risk: A Meta-analytic Review” (2010, N=308,849):

» Weaker vs stronger relationships, OR = 1.50[1.42, 1.59]

And from “Social isolation, loneliness, and all-cause mortality in older men and women” (2013, N=6500):

» Highest vs other quintiles of social isolation, HR = 1.26[1.08, 1.48]

And from “Marital status and mortality in the elderly: A systematic review and meta-analysis” (2007, N>250,000, no full text):

» Married vs all currently non-married, RR = 0.88[0.85, 0.91]

You can propose a causal mechanism off the top of your head: people with more friends are less depressed which just has good health outcomes.

However, the alarm bells should be ringing: is the causal relationship backwards? Are healthier people more prone to socializing? Do the confounders never end? The kicker is that all these studies are looking at the elderly (above 50yo at least), which reduces their general applicability even more.

Other studies I looked at:

Cellphone Usage

Remember when everyone was worried that chronic cellphone usage was going to give us all cancer?

Well “Mobile Phone Use and Risk of Tumors: A Meta-Analysis” (2008, N=37,916) says it actually does:

» Overall tumor, OR = 1.18[1.04, 1.34]

» Malignant tumor, OR = 1.00[0.89, 1.13]

Since we’re worried about malignant tumors, it’s hard to say we should be worried by cellphones.

Other studies I looked at:


Confusing thirst with hunger

Some people recommend taking a drink when you feel hungry, the idea being that thirst sometimes manifests as hunger, and you can end up eating fewer calories.

Unfortunately, I couldn’t find any studies that tried to look into this specifically: the closest thing I found was “Hunger and Thirst: Issues in measurement and prediction of eating and drinking” (2010) which reads like a freshman philosophy paper, and “Thirst-drinking, hunger-eating; tight coupling?” (2009, N=50?) which fails to persuade me about… anything, really.

Stress Reduction in a Pill

There are some “natural” plants rumored to have stress reduction effects, Rhodiola rosea and Ashwagandha root.

Meta-analysis on Rhodiola, “The effectiveness and efficacy of Rhodiola rosea L.: A systematic review of randomized clinical trials” (2011, N=unknown) found that Rhodiola had effects on something, but the study was basically a fishing expedition. Even the study name betrays that it doesn’t matter what it’s effective at, just that it’s effective.

Another meta-analysis, “Rhodiola rosea for physical and mental fatigue: a systematic review” (2012, N>176) looked specifically at fatigue and found mixed results.

Meta-analysis on Ashwagandha, “Prospective, Randomized Double-Blind, Placebo-Controlled Study of Safety and Efficacy of a High-Concentration Full-Spectrum Extract of Ashwagandha Root in Reducing Stress and Anxiety in Adults” (2012, N=64) found reductions in self-reported stress scales and cortisol levels (and with RCTs!).

Look, the Ns are tiny, and the studies the meta-analyses are based on are old, and who knows if the Russians were conducting their side of the studies right (Rhodiola originated in Russia, so many of the studies are Russian).

I’m including this because I got excited when I saw it in the original longevity post: stress reduction in a pill! Why do the hard work of meditation when I could just pop some pills (a very American approach, I know)? It just doesn’t look like the evidence base is trustworthy, and my personal experiences confirm that if there’s an effect it’s subtle (Whole Foods carries both Rhodiola and Ashwagandha, so you can try them out for yourself for like $20).

Other studies I looked at:

Water Filters

Unfortunately, there’s basically no research on health effects from water filtration in 1st world countries above and beyond municipal water treatment. Most filtration research is either about how adding any filtration to 3rd world countries has massive benefits, or how bacteria can grow on activated carbon granules. Good to know, but on reflection did we expect bacteria to stop growing wherever it damn well pleases?

So keep your Brita filter, but it’s not like we know for sure whether it’s doing anything either. Probably not worth it to go out of your way to get one.

Hand sanitizer

So I keep hand sanitizer in multiple places in my apartment, but does it do anything?

I only found “Effectiveness of a hospital-wide programme to improve compliance with hand hygiene” (2000, N=unknown), which focused on hospital health outcomes impacted by hand washing adherence. First, not all doctors wash their hands regularly (40% compliance rates in 2011) (scholarly overview), which is worrying. Second, there’s a positive trend between hand washing (including hand sanitizers) and outcomes:

» From moving 48% hand washing adherence to 66%, the hospital-wide infection rate decreased from 16.9% to 9.9%.

However, keep in mind that home and work are usually less adverse environments than a hospital; there are fewer people with compromised immune systems, there are fewer gaping wounds (hopefully). The cited result is probably an upper bound for us non-hospital folk.

(There’s also this cute study: hand sanitizer contains chemicals that make it easier for other chemicals to penetrate the skin, and freshly printed receipts have plenty of BPA on the paper. This means that sanitizing and then handling a receipt will lead to a spike of BPA in your bloodstream. I presume that relative to eating with filthy hands the BPA impact is negligible, but damn it, researchers are doing these cute small scale studies instead of the huge randomized trials I want.)

Other studies I looked at:

Doctor visits

Should you visit your doctor for a annual checkup? My conscientious side says “of course”, but my contrarian side says “of course not”.

Well, “General health checks in adults for reducing morbidity and mortality from disease” (2012, N=182,880, Cochrane) says:

» Annual checkup vs no exam, RR = 0.99[0.95, 1.03]

So basically no impact! Ha, take that, couple hour appointment!

However, The Chicago Tribune notes some mitigating factors, like the main studies the meta-analysis is based on are old, like 1960s old.


I didn’t look at metformin in my main study period: I knew it had some interesting results, but it also caused gastrointestinal distress, better known as diarrhea. It brings to mind the old quip: metformin doesn’t make you live longer, it just feels like it[9].

However, while I was reading Tools of TitansDominic D’Agostino floated an intriguing idea: he would titrate the metformin dose from some tiny amount until he started exhibiting GI symptoms, and then dialed it back a touch. I don’t think people have started even doing small scale studies around this, but it might be worth looking into.


There’s some stuff that doesn’t have a cost-benefit calculation attached, but I’m including anyways. Or, there are things that won’t help you, but might help the people around you.


From “Effectiveness of Bystander-Initiated Cardiac-Only Resuscitation for Patients With Out-of-Hospital Cardiac Arrest” (2007, N=4902 heart attacks):

» Cardiac-only CPR vs no CPR, OR 1.72[1.01, 2.95]

So the odds ratio looks pretty good, except that CI is really wide, and the in absolute terms most people still die from heart attacks: administering CPR raises the chances of survival from 2.5% to 4.3%. So, spending more than a few hours practicing CPR is chasing some really tail risks[10].

However, have two people in your friend group that know CPR, and they can provide a potential buff to everyone around them (two, because you can’t give CPR to yourself). In a similar vein, the Heimlich maneuver might be good to know.

Other studies I looked at:

Smoke Alarm testing

Death by fire is not super common. That said, these days it’s cheap to set up a reminder to check your alarm on some long interval, like 6 months.


It’s unlikely you’ll need to do trauma medicine in the field, but if you’re paranoid about tail risk then quikclot (and competitors) can serve as a buttress against bleeding out. Some folks claim that tourniquets are better, but the trauma bandages are a bit more versatile, since you can’t tourniquet your chest.

It’s not magical: since the entire thing becomes a clot, it’s basically just moving a life threatening wound from the field into a hospital. Also make sure to get the bandage form, not the powder; some people have been blinded when the wind blew the clot precursor into their eyes.


Of course, this post wouldn’t be complete without a nod to cryonics. It’s the ultimate backstop. If there all else fails, there’s one last option to make a Hail Mary throw into the future.

Obviously there are no empirical RR values I can give you: you’ll have to estimate your own probabilities and weigh your own values.

WTF, Science?

The overarching story is that we cannot trust anything, because almost all the studies are observational and everything could be confounded to hell by things outside the short list that every study incants they controlled for and we would have no idea.

Like Gwern says, even the easiest things to randomize, like giving people free beer, aren’t being done, much less on a scale that could give us some real confidence.

There is too little disregard for the garden of forking paths in this post-replication crisis world, and many studies are focused on subgroups that plausibly won’t generalize (ex. the elderly).

And what’s up with the heterogeneity in meta-analyses? If every single analysis results in “these results displayed significant heterogeneity”, then what’s the point? What are we doing wrong?

What am I doing?

Maybe you want to know what me myself am doing; I suspect people would be interested for the same reason journalists intersperse a perfectly good technical thriller with human interest vignettes, so here:

  • Continuing vitamin D supplementation, and getting a couple minutes of sun when I can.
  • Making an effort to eat more vegetables, less bacon/potatoes (to be honest, I’m more optimistic about cutting out the bacon than potatoes), more fish, and replacing more of my snacking with walnuts.
  • Keep taking fish oil.
  • Exercise better: I haven’t upped the intensity of my routine in a while. I probably need some more aerobic work, too.
  • Tell myself I should iron out my sleep schedule.
  • Get myself a standing desk for home: I have a standing desk at work, so I’m already halfway there.
  • Buy an air filter: low impact, but whatever, gimmie my percentage points of RR.
  • Switch from drinking black tea to green tea.
  • Cut back on donating blood. I’ll keep doing it because it’s also wrapped up in “doing good things”, but I was doing it partly selfishly based on the non-quasi-randomized studies. Besides, I have shitty blood.


Effective and certain:

  • Supplement vitamin D.

Effective, possibly confounded:

  • Exercise vigorously 5 hours/week.
  • Eat more fruits and vegetables, more fish, less red meat, cut out the bacon.
  • Get 7-9 hours of sleep.

Less effective, less certain:

  • Brush your teeth and floss daily.
  • Try to not sit all day.
  • Regarding air quality, don’t live in Beijing.

There is also a presentation.

[1]  If you need me to go through the science of smoking, then let me know and I can do so: I mostly skipped it because I’m already not smoking, and the direction of my study was partly determined by what could be applicable to me. As a non-smoker, I didn’t even notice it was missing until a late editing pass.

[2]  The abstract reports results in terms of percentage mortality decrease, which I believe maps to the same RR I gave.

[3]  If I remember correctly, Due Diligence talks about this.

[4]  The Cochrane Group does good, rigorous analysis work. Gwern is an independent researcher in my in group, and he seems to be better at this sort of thing than I am.

[5]  Annoyingly, some meta-analyses don’t report the aggregate sample sizes for analyses that only use a subset of the analyzed reports.

[6]  For example, Scott’s review of The Hungry Brain points out that some people think potatoes are great at satiating appetites, so it might in fact work out in favor of being okay.

[7]  These category comparisons are loose, since some studies will report quartiles and others will use tertiles, so the analysis simply goes with the largest effect possible across all studies.

[8]  Yes, it’s fucking stupid I have to stoop to this.

[9]  Originally “marriage doesn’t make you live longer, it just feels like it.”

[10]  I know, it’s ironic that I’m calling this a tail risk, when we’re pushing something as stubborn as the Gompertz curve.

Filed under: Uncategorized
No Comments »

9s of Cats

Epistemic status: value judgement.

The internet has a lot of cat pictures.

Let's say I upload a cat picture to Amazon's Simple Storage Service (S3). As of writing, their marketing materials claim that a stored object is 99.999999999% likely to stay securely stored in a year, which translates into a 50% chance of losing a given cat picture once every 70 billion years years[1]. In storage/networking jargon, this is 11 9's of durability, a sort of fast n' dirty logarithmic shorthand for stating how reliable a service is found by counting the 9s in a percentage. For example, 99.9% would be 3 9's.

This doesn't mean that Amazon is super optimistic and thinks the chance of total nuclear war or perfect storm pandemic is some tiny percentage. It's just that if civilization does collapse then former customers would want Amazon warriors over Amazon refunds. Conditional on the continued existence of Amazon, the business, they'll probably keep doing crazy replication schemes[2] to maintain those crazy guarantees.

However, smaller apocalypses will leave Amazon broken while humanity lives on[3]. In these futures, I could easily imagine children gathered around a working fin de sicle computer wondering why in the world this cat looks so grumpy?

So certain cat photos might in fact have 11 9's of durability, enough to live 9 lives over and over. What about humans?

What about humans? Looking at the 2014 CDC death rates, there are 823.7 deaths/100,000 people, working out to a 99.18% annual durability for a randomly selected human (American), for a measly 2 9s of durability. If you show someone a cat picture when they are 12, at best you can expect them to hold onto that memory with 2 9s of durability, because after that they are likely dead[4].

Cat pictures hold together with 11 9s: humans hold together with 2 9s.

It seems a little incongruous, yes? One is a chuckle-worthy image, and the other is a person.

I mean, there is a good reason, one is much more complicated than the other. Grumpy Cat herself will die far before her image does (maybe that's why she's grumpy?). We can barely simulate nematode neural systems, and even simply finding a human's brain connectome (connection graph) is still prohibitively expensive, much less running the entire graph forwards in time[5].

Instead of doing the naive thing easily suggestible by the S3 analogy and trying to scan people to replicate them across availability zones[6], we could simply extend their lives. For example, we boosted the general US life expectancy from 40 years to 80 years since the early 1800s. But note:

y(t) = a \cdot e^{-b \cdot e^{-c \cdot t}}

It's not even "fuck the natural logarithm", it's "fuck the double logarithm". If we find some fantastic intervention in a pill that reduces our relative risk of death by half without any side effects, that halves the b value, which means this only moves the curve over a few years[10]:

A graph showing two curves, one with normal humans,
and the other with humans that have half the relative risk (RR)

We'll somehow need to invalidate this model with our mental fists.

(At this point, I should point out that there are some people working on the problem with an eye towards halting or reversing aging[11], like The SENS Foundation and The Methuselah Foundation. They are nonprofits, and could always use more money: if nothing else, they could make a bigger incentive prize of the XPrize sort.)

But I didn't write this post to complain about our problems, I wrote this post because:

  1. coining "9s of cats" was too tempting to pass up.
  2. consider this a weak post-pre-registration[12] of an informal study I did for well supported longevity actions we common folk can do today. Sure, the things we do are still subject to the steep demands of the Gompertz curve, but we want to maximize our chances of hitting Kurzweil's escape velocity if/when it happens.

Stay tuned.


[1]  Note that this is for a specific object, and not for a set of objects. If you have 10 trillion objects, you might see one of them go missing in a year, and that would be within the guarantee.

[2]  If you want an example of the sorts of replication large tech companies use, you can check out Facebook's blob store.

[3]  Note that while I work for a competitor of Amazon, I don't intend for this to be a pleasant daydream, but a nightmarish one. Also, it bears repeating that I do not presume to speak for my employer, etc etc.

[4]  This doesn't even include things like Alzheimers, which destroy the people without destroying their bodies.

[5]  Contrast this with genome sequencing costs, which have dived faster than exponentially. Today, you can get your genome sequenced for around $1k (the cost is sitting behind some cost request, but I've heard from biologists that Illumina whole-genome sequencing is around that much. Veritas Genetics also has a quote for around that much). It's possible that high resolution scanning technology will hit a similar trend, but it might not.

[6]  Availability zones: broad sectors with non-overlapping support, the theory being that bringing down one zone doesn't bring down the others. Concretely, it would be harder to kill you for good if you had copies living in both Europe and Asia.

[7]  Quip appropriately lifted from Ra, the Space Magic chapter.

[8]  To be fair, that's going from "lol leeches for everyone" to "well, let's scrape your bones out and put them in another person, and hey presto, they stopped dying!".

[9]  More by Gwern on his longevity page.

[10]  Graphic generated using an R+ggplot2 script, available as a Github Gist. I use the same curve that Gwern does circa 2017.

[11]  There are arguments against extending human lifespans, like overcrowding, but that's silly. Droning on about the sanctity of death because it's the Dark Ages is fine, but defaulting to death because oh no there are problems to overcome is a damn defeatist attitude. If you haven't read Bostrom's The Fable of the Dragon Tyrant, it's a gentle storytale introduction to non-deathism.

[12]  A pre-registration, so I can't just sweep things under the table, and weak, because I've already done the bulk of the research and analysis.

Filed under: Uncategorized
No Comments »

Subdermal Scientific Delivery

Epistemic status: crap armchair theorizing.

PutANumOnIt points out that psychology is broken. Having read Robyn Dawes’ House of Cards and Andrew Gelman’s post on the replication crisis, I agree with him, it is kind of crappy that it’s been years since the replication crisis and still nothing seems to have changed.

However, I disagree with the shape of his reaction, both online and in person (I was in the same room with him and the psychology student). What he said was true and necessary, but his frustration wasn’t usefully channeled. I think that adding the 3rd Scott Alexander comment requirement[1], kindness, would have at least very minutely helped move us towards a world of better science.

Why kindness? Well, how could we fix psychology without it? Some fun ideas:

  • The government could set higher standards for the science it funds.
  • Scientific journals could uphold higher standards.
  • The universities that host the psychology professors could start demanding higher standards from the professors, like for granting tenure.
  • The APA (American Psychological Association) could publish guidelines pushing for higher standards[2].
  • Psychology curriculum writers could emphasize statistics more.

If we could do any of these with a wave of a wand, any one of these would… well, wouldn’t end the crisis, but it would push things in the right direction.

However, we don’t have a wand, so I’m not confident any of these are going to happen with the prevailing business as usual.

  • The journals, APA, and curriculum writers solutions are recursive: the psychologists themselves are integral parts of those processes. It’s possible to push on non-recursive parts, like getting a key textbook writer to include an extra chapter on probabilistic pitfalls[3], but trying to hook a key figure is difficult[4].
  • Curriculum writers set their sights on the next generation, not the current one. It seems like the curriculum is already slowly changing, but waiting for the entire field to advance “1 death at a time” is kind of slow.
  • The government is going to move slowly, and special interests like pharmaceutical companies invested in softer standards would throw up (probably non-obvious) roadblocks. Also, the APA has much more cachet with the government than me or Andrew Gelman. David and Goliath is a morality tale, not a blueprint for wild success.

    Or, more concretely, how do you get psychologists to not tell their patients to call their congressmen, because they’re being put out of a job as collateral damage in a campaign for better science?[5]

And notice that these all sum up large efforts: what does it mean to convince the government to have higher standards for the science it funds? It’s an opaque monolithic goal with an absolute ton of moving parts behind the scenes, most of which I’m blissfully ignorant of. These actions are so big that it’s easy to give in to the passive psychological warfare (ha!) and give up. It’s The Art of War, convincing people to accept defeat without even fighting by just impressing them with the apparent momentum of the problem. What could one do to turn that juggernaut?

In contrast, I want to focus on the opposite end of the scale; what if we tried to convince our lone psychology graduate student to consider better statistical methods?

But how? If you squint hard enough, it’s a sort of negotiation: we want the student to spend a non-trivial amount of time learning lots of statistics, while the student probably does not want to spend their Friday evenings reading about the how to choose Bayesian priors. We need to convince the student that they should care, if not on Friday evening, then sooner than later.

Let’s borrow some ideas from the nauseatingly self-help book “Getting Past No”:

  1. “Go to the balcony”: make sure to step back and separate the frustration at poor science from the goal of getting better science.
  2. “Step to their side”: I imagine the psychologists would like to do good science, to take pride in their work and have it stand the test of time. However, just telling someone that there’s a replication crisis isn’t helping them deal with it, it’s putting yet another item on their stack full of things all clamoring for their attention while seeming vaguely negative. And how does it go? “No one ever got fired for choosing <field standard here>”. We will want something more positive…
  3. “Build them a golden bridge”: at the very least, we need to make it easy to use the better statistical methods[6], and offer support to those that are interested. Even better would be demonstrating that the methods we’re offering are better than the old and tired methods they’re using: for example, Jaynes recounts a story in “Probability Theory”, where geological scientists accused him of cheating because the Bayesian methods he used simply could not have been that good.

You’ll note that this is super abstract and not at all a blow-by-blow playbook for convincing anyone about scientific processes. Indeed, the entire process of starting with convincing a single graduate student is to figure out what the actual playbook is. Like in startup parlance, “do things that don’t scale”: even if I directly convinced 1 psychologist a day to use better statistical methods, America mints more than 365 psychologists in a year. But, if I instead found a message that tightly fit the profession and then posted that on the internet, there would be a chance that could take off. (More on this in the Appendix.)

At some point, it’s not enough to have a message that can convince graduate students: if we want to have an impact on timescales shorter than a generation, we’ll have to solve the hard problem of changing a field while the most of the same people are working in it. So, an equally hand-wavey game plan for that scenario:

  1. Ideally, get one of their graduate students on board to provide trusted in-house expertise, and to find out what sorts of problems the research group is facing.
  2. Convince the local statistics professor to endorse you: that way, you can get past the first “this guy is a crank” filters.
  3. (¿¿¿) Somehow convince the professor to consider your methods, who probably wants to work more on his next grant application and less on learning arcane statistics. Apply liberal carrot and stick[7] to refocus their attention on the existential threat slowly rolling towards them. (???)

I expect every community organizer to roll their eyes at my amateur hour hand waving around “and then we convince person X”. However, I am confident we do need to do the hard ground work to make the revolution happen.

In the end, I think we hope to make something like one of the following happen:

  • virally spread a 80/20 payload of better statistics among psychologists, and get a silent super majority of psychologists that all adhere on the surface to current institutional norms, but who eventually realize “wait, literally all my colleagues also think our usage of p values is silly” and a fast and bloodless stats revolution can happen.
  • move the psychology Overton window enough that an internal power struggle to institute better practices can plausibly succeed, led by psychologists that want to preserve the validity of their field.
  • in the course of convincing the entire field, figure out how to actually “statistical spearphish” up and coming field leaders, so they can save their field from the top[8].

So when I heard Jacob express a deep frustration to the student conveying “your methods are bad” (true) which was easily interpretable as “you should feel bad” (probably not intended), I saw the first step of the above revolution die on the vine. Telling people to feel bad (even unintentionally) is not how you win friends and influence people! To head off an obvious peanut gallery objection, it’s not like we’re allowing bad epistemology to flourish because oh no someone might find out they were wrong and feel bad so we can’t say anything ever. It is more pragmatic: compare trying to force someone to accept a new worldview, versus guiding them with a Socratic dialog to the X on the map so they unearth the truth themselves.

Maybe the common community that includes Jacob and I don’t want to devote the absolutely ludicrous resources needed towards reforming a field that doesn’t seem to want to save itself[9]. At the very least, though, we should try not to discourage those that come seeking knowledge, as our graduate student was.

And the alternative? That’s easy, we don’t do anything. Just let psychology spew bad results and eventually crash and bleed out, taking lent scientific credibility with it. I don’t think the field is too big to fail, but it sure would be inconvenient if it did.

(And since you’re the sort of person that reads this blog, then I might add that destroying a field focused on human-level minds right as a soft AI take off starts producing human-level complexity minds might be a poor idea[10].)

However, let’s raise the stakes: what if it’s not just psychology? I have a friend working in another soft-ish science field, closer to biology, and he reports problems there too. An upcoming post will in passing point out some problematic medical research. Again, I don’t think destroying psychology would bring down the entire scientific enterprise, but I do think destroying all fields as soft as biology would. So saving psychology is a way to find out if we can save science from statistical post-modernism; as the song goes “if you can make it there, you can make it anywhere”.

Maybe I’ll take up the cause. Maybe not[11]. If I do, more on this later.

Appendix: Other Actions, Other Considerations

Not everything is trying to convince people in 1-on-1 chats or close quarters presentations/workshops. Especially once we figure out what the scientists need and how we can get it to them, I think we’ll need:

  • better statistical material support geared towards working scientists. Similar to the website idea floated earlier in the post, having a central place that has all the practical wisdom will make it easier to scale.
  • provide better statistical packages that aren’t arcane and insane (looking at you, R), and do The Right Thing by default, and warn when you’re doing the wrong thing and why it is wrong. However, this will likely end up being in the existing statistical ecosystems like R, since that’s where the users are. Similar to the previous point, this also includes better tutorial and usage support.

Other things would help, but are harder in ways I don’t even know how to start solving:

  • Like House of Cards recommends, we could not require therapists to do original research. That’s like requiring medical students to get unrelated undergrad degrees for a touch of class around the office: expensive, inflating the need for positive research, and of dubious help. Yes, reducing credentialism is difficult.
  • Stop requiring positive results for publication. This is the problem for most scientific fields, because you need publication to become a PhD, and you need positive results to publish because negative results aren’t exciting. So you get p-hacking to get published, because you’ve told people “lol sink or swim” and by god they’re going to bring illegal floaties.
  • Or, give negative replications more weight/publication room. This would have the negative effect that it’ll probably increase animosity in the field, and professionals don’t want that, so there will still be costs to overcome. Changing the culture to detach yourself from your results will be… difficult.

[1]  Scott Alexander’s blog, Slate Star Codex, has a comment policy requiring comments be true, necessary, or kind, with at least two of those attributes.

[2]  Sure, guidelines don’t cause higher standards directly, but it makes it much easier to convince people that pay attention, especially those that aren’t already entrenched.

[3]  This specific strategy is additionally prone to failure since teachers pick and choose what material to use from the textbook buffet, so a standalone section on statistics would likely go unused. An entire textbook using unfamiliar statistics would be an even tougher sell.

[4]  In case it’s not clear: trying to convince key figures that they should do a thing is difficult, because if they were easy to convince, then every crank that walked into their office could have the key figure off on their own personal goose chase.

[5]  Yes, there isn’t a 1-to-1 mapping between demanding better statistics and putting therapists out of their job. However, if things have to become legislative, then it seems likely the entire field of psychology will be under attack, with non-trivial airtime going towards people with an axe to grind about psychology. And heaven forbid it become a partisan issue, but when has heaven ever cared?

[6]  In this regard, Stan by Andrew Gelman and co looks pretty interesting, even if I have no idea how to use it.

[7]  Yes, carrot and stick. We’ll need to introduce discussion of negative consequences sooner or later: if not the future destruction of science, then maybe something about their legacy or pride, or whatever.

[8]  Unlikely for the same reasons included in a previous footnote, but included for completeness.

[9]  The field as a whole, not counting individual people and groups.

[10]  A thousand and one objections for why this is a bad analogy spring to mind, but I think we could agree that conditional on this scenario, it couldn’t be worse to have a functioning field of psychology than not.

[11]  Remember, aversion to “someone has to, and no one else will”.

Filed under: Uncategorized
No Comments »
All content is licensed under CC-by-nc-sa
Creative Commons License