# Nathan Hwang

## Subdermal Scientific Delivery

Epistemic status: crap armchair theorizing.

PutANumOnIt points out that psychology is broken. Having read Robyn Dawes’ House of Cards and Andrew Gelman’s post on the replication crisis, I agree with him, it is kind of crappy that it’s been years since the replication crisis and still nothing seems to have changed.

However, I disagree with the shape of his reaction, both online and in person (I was in the same room with him and the psychology student). What he said was true and necessary, but his frustration wasn’t usefully channeled. I think that adding the 3rd Scott Alexander comment requirement[1], kindness, would have at least very minutely helped move us towards a world of better science.

Why kindness? Well, how could we fix psychology without it? Some fun ideas:

• The government could set higher standards for the science it funds.
• Scientific journals could uphold higher standards.
• The universities that host the psychology professors could start demanding higher standards from the professors, like for granting tenure.
• The APA (American Psychological Association) could publish guidelines pushing for higher standards[2].
• Psychology curriculum writers could emphasize statistics more.

If we could do any of these with a wave of a wand, any one of these would… well, wouldn’t end the crisis, but it would push things in the right direction.

However, we don’t have a wand, so I’m not confident any of these are going to happen with the prevailing business as usual.

• The journals, APA, and curriculum writers solutions are recursive: the psychologists themselves are integral parts of those processes. It’s possible to push on non-recursive parts, like getting a key textbook writer to include an extra chapter on probabilistic pitfalls[3], but trying to hook a key figure is difficult[4].
• Curriculum writers set their sights on the next generation, not the current one. It seems like the curriculum is already slowly changing, but waiting for the entire field to advance “1 death at a time” is kind of slow.
• The government is going to move slowly, and special interests like pharmaceutical companies invested in softer standards would throw up (probably non-obvious) roadblocks. Also, the APA has much more cachet with the government than me or Andrew Gelman. David and Goliath is a morality tale, not a blueprint for wild success.

Or, more concretely, how do you get psychologists to not tell their patients to call their congressmen, because they’re being put out of a job as collateral damage in a campaign for better science?[5]

And notice that these all sum up large efforts: what does it mean to convince the government to have higher standards for the science it funds? It’s an opaque monolithic goal with an absolute ton of moving parts behind the scenes, most of which I’m blissfully ignorant of. These actions are so big that it’s easy to give in to the passive psychological warfare (ha!) and give up. It’s The Art of War, convincing people to accept defeat without even fighting by just impressing them with the apparent momentum of the problem. What could one do to turn that juggernaut?

In contrast, I want to focus on the opposite end of the scale; what if we tried to convince our lone psychology graduate student to consider better statistical methods?

But how? If you squint hard enough, it’s a sort of negotiation: we want the student to spend a non-trivial amount of time learning lots of statistics, while the student probably does not want to spend their Friday evenings reading about the how to choose Bayesian priors. We need to convince the student that they should care, if not on Friday evening, then sooner than later.

Let’s borrow some ideas from the nauseatingly self-help book “Getting Past No”:

1. “Go to the balcony”: make sure to step back and separate the frustration at poor science from the goal of getting better science.
2. “Step to their side”: I imagine the psychologists would like to do good science, to take pride in their work and have it stand the test of time. However, just telling someone that there’s a replication crisis isn’t helping them deal with it, it’s putting yet another item on their stack full of things all clamoring for their attention while seeming vaguely negative. And how does it go? “No one ever got fired for choosing <field standard here>”. We will want something more positive…
3. “Build them a golden bridge”: at the very least, we need to make it easy to use the better statistical methods[6], and offer support to those that are interested. Even better would be demonstrating that the methods we’re offering are better than the old and tired methods they’re using: for example, Jaynes recounts a story in “Probability Theory”, where geological scientists accused him of cheating because the Bayesian methods he used simply could not have been that good.

You’ll note that this is super abstract and not at all a blow-by-blow playbook for convincing anyone about scientific processes. Indeed, the entire process of starting with convincing a single graduate student is to figure out what the actual playbook is. Like in startup parlance, “do things that don’t scale”: even if I directly convinced 1 psychologist a day to use better statistical methods, America mints more than 365 psychologists in a year. But, if I instead found a message that tightly fit the profession and then posted that on the internet, there would be a chance that could take off. (More on this in the Appendix.)

At some point, it’s not enough to have a message that can convince graduate students: if we want to have an impact on timescales shorter than a generation, we’ll have to solve the hard problem of changing a field while the most of the same people are working in it. So, an equally hand-wavey game plan for that scenario:

1. Ideally, get one of their graduate students on board to provide trusted in-house expertise, and to find out what sorts of problems the research group is facing.
2. Convince the local statistics professor to endorse you: that way, you can get past the first “this guy is a crank” filters.
3. (¿¿¿) Somehow convince the professor to consider your methods, who probably wants to work more on his next grant application and less on learning arcane statistics. Apply liberal carrot and stick[7] to refocus their attention on the existential threat slowly rolling towards them. (???)

I expect every community organizer to roll their eyes at my amateur hour hand waving around “and then we convince person X”. However, I am confident we do need to do the hard ground work to make the revolution happen.

In the end, I think we hope to make something like one of the following happen:

• virally spread a 80/20 payload of better statistics among psychologists, and get a silent super majority of psychologists that all adhere on the surface to current institutional norms, but who eventually realize “wait, literally all my colleagues also think our usage of p values is silly” and a fast and bloodless stats revolution can happen.
• move the psychology Overton window enough that an internal power struggle to institute better practices can plausibly succeed, led by psychologists that want to preserve the validity of their field.
• in the course of convincing the entire field, figure out how to actually “statistical spearphish” up and coming field leaders, so they can save their field from the top[8].

So when I heard Jacob express a deep frustration to the student conveying “your methods are bad” (true) which was easily interpretable as “you should feel bad” (probably not intended), I saw the first step of the above revolution die on the vine. Telling people to feel bad (even unintentionally) is not how you win friends and influence people! To head off an obvious peanut gallery objection, it’s not like we’re allowing bad epistemology to flourish because oh no someone might find out they were wrong and feel bad so we can’t say anything ever. It is more pragmatic: compare trying to force someone to accept a new worldview, versus guiding them with a Socratic dialog to the X on the map so they unearth the truth themselves.

Maybe the common community that includes Jacob and I don’t want to devote the absolutely ludicrous resources needed towards reforming a field that doesn’t seem to want to save itself[9]. At the very least, though, we should try not to discourage those that come seeking knowledge, as our graduate student was.

And the alternative? That’s easy, we don’t do anything. Just let psychology spew bad results and eventually crash and bleed out, taking lent scientific credibility with it. I don’t think the field is too big to fail, but it sure would be inconvenient if it did.

(And since you’re the sort of person that reads this blog, then I might add that destroying a field focused on human-level minds right as a soft AI take off starts producing human-level complexity minds might be a poor idea[10].)

However, let’s raise the stakes: what if it’s not just psychology? I have a friend working in another soft-ish science field, closer to biology, and he reports problems there too. An upcoming post will in passing point out some problematic medical research. Again, I don’t think destroying psychology would bring down the entire scientific enterprise, but I do think destroying all fields as soft as biology would. So saving psychology is a way to find out if we can save science from statistical post-modernism; as the song goes “if you can make it there, you can make it anywhere”.

Maybe I’ll take up the cause. Maybe not[11]. If I do, more on this later.

# Appendix: Other Actions, Other Considerations

Not everything is trying to convince people in 1-on-1 chats or close quarters presentations/workshops. Especially once we figure out what the scientists need and how we can get it to them, I think we’ll need:

• better statistical material support geared towards working scientists. Similar to the website idea floated earlier in the post, having a central place that has all the practical wisdom will make it easier to scale.
• provide better statistical packages that aren’t arcane and insane (looking at you, R), and do The Right Thing by default, and warn when you’re doing the wrong thing and why it is wrong. However, this will likely end up being in the existing statistical ecosystems like R, since that’s where the users are. Similar to the previous point, this also includes better tutorial and usage support.

Other things would help, but are harder in ways I don’t even know how to start solving:

• Like House of Cards recommends, we could not require therapists to do original research. That’s like requiring medical students to get unrelated undergrad degrees for a touch of class around the office: expensive, inflating the need for positive research, and of dubious help. Yes, reducing credentialism is difficult.
• Stop requiring positive results for publication. This is the problem for most scientific fields, because you need publication to become a PhD, and you need positive results to publish because negative results aren’t exciting. So you get p-hacking to get published, because you’ve told people “lol sink or swim” and by god they’re going to bring illegal floaties.
• Or, give negative replications more weight/publication room. This would have the negative effect that it’ll probably increase animosity in the field, and professionals don’t want that, so there will still be costs to overcome. Changing the culture to detach yourself from your results will be… difficult.

[1]  Scott Alexander’s blog, Slate Star Codex, has a comment policy requiring comments be true, necessary, or kind, with at least two of those attributes.

[2]  Sure, guidelines don’t cause higher standards directly, but it makes it much easier to convince people that pay attention, especially those that aren’t already entrenched.

[3]  This specific strategy is additionally prone to failure since teachers pick and choose what material to use from the textbook buffet, so a standalone section on statistics would likely go unused. An entire textbook using unfamiliar statistics would be an even tougher sell.

[4]  In case it’s not clear: trying to convince key figures that they should do a thing is difficult, because if they were easy to convince, then every crank that walked into their office could have the key figure off on their own personal goose chase.

[5]  Yes, there isn’t a 1-to-1 mapping between demanding better statistics and putting therapists out of their job. However, if things have to become legislative, then it seems likely the entire field of psychology will be under attack, with non-trivial airtime going towards people with an axe to grind about psychology. And heaven forbid it become a partisan issue, but when has heaven ever cared?

[6]  In this regard, Stan by Andrew Gelman and co looks pretty interesting, even if I have no idea how to use it.

[7]  Yes, carrot and stick. We’ll need to introduce discussion of negative consequences sooner or later: if not the future destruction of science, then maybe something about their legacy or pride, or whatever.

[8]  Unlikely for the same reasons included in a previous footnote, but included for completeness.

[9]  The field as a whole, not counting individual people and groups.

[10]  A thousand and one objections for why this is a bad analogy spring to mind, but I think we could agree that conditional on this scenario, it couldn’t be worse to have a functioning field of psychology than not.

[11]  Remember, aversion to “someone has to, and no one else will”.

Filed under: Uncategorized

## The Future of Football is too Near

Epistemic status: opinions and ranting

What does the future of football look like? Yes, it is totally about football; it starts out weird, just stick with it and you’ll get to the football[1].

Well, the rest of this post is about that story, so… spoilers ahoy.

Didn’t expect your football with a big dollop of science fiction, huh? I generally enjoyed it, which is why I made you read it: the narrative is great at progressively painting[2] an increasingly weird world, using multiple short stories to sketch out the implications of the “what if everyone stopped dying?” high concept. Some parts were cringe worthy: I would expect “if you think about it, everything is a miracle” to be overlaid on nature scenery in a flowing font and shared among people that think crystals affect your chi. Some of it was brilliant: I was a fan of using indentation/color to represent different speakers, instead of wading through “he said, she said”, which is a mechanic I hope to steal.

But you’ve read the story, you already know this. Instead, what I want to do is explore some external relationships to the story:

1. The author did not converse with the existing universe of science fiction. If you’re going to write science fiction, especially utopic science fiction, then not engaging with existing concepts and utopias just raises endless questions. In my case, it definitely left me with a sense of fridge logic.
2. The author sketches a world, and raises interesting questions in the tradition of science fiction that comments more about our current world and less about the world to come. Unfortunately, there’s too much emphasis on the commentary part of the story, and not on the story part. The author didn’t dialog with his characters to find out what they wanted, and instead just used his characters as a mouthpiece, which was distracting[3].

# On Boredom

Much of the story revolves around people that have lived a long time, and expect to live forever. Nothing else about them has been changed, though, and this means that suddenly human attention spans are a lot shorter than their lives, leading to looming boredom as they quickly (relative to their lifespan) run out of things to do, leading to people sitting in caves and playing the same hand held video game for hundreds of years at a time in order to stretch out the novelty value.

If you squint in just the right way, it’s a sort of crazy mirror metaphor for our lives. We joke about multiples of internet years in a calendar year, and feel a weary sense that we’ve seen it all[4], that Reddit is full of reposts and that meme is so last week. We’re the ones running out of things to do, playing games that look suspiciously similar to games made decades ago while sitting in our (man) caves.

It’s a cute thought, but I reject the notion that the best humanity could do was putting a cannon on a mountain. For example, at least one of my friends would be out driving a car in real life Rocket League, complete with giant exploding ball and a 3rd person follower camera drone, with a slavish attention to using just the right materials in order to match the game physics. Okay, fine, Rocket League is car soccer, so obviously a sci-fi story about football wouldn’t cover any Rocket League related shenanigans. However, spaceball? Roboball, either of the Frozen Cortex or NFL robot mascot variety (limbs are open season!)? Mech Warrior ball? Mariana trench ball, with a genetically modified angler fish ball?

I mean, points for putting a restaurant on a football field, but that’s just scratching the surface for all the different things you could do that would still resemble football in some shape or form.

And outside of football, there’s just so much to do. It’s the post-scarcity far future, the wish-granting telephones are raining outside. And, well… once you’ve see what can be done, why would you go back to playing football?

• In-story: re-freeze the ice caps and reclaim New York City. The most brute force solution is using sun shades, which is well within their technological grasp. Sure, the author wanted to advertise for climate change action, but the incongruity of “humanity has done everything and is now bored” vs “lol NYC is underwater can’t do anything about that” is jarring.
• In-story: throw the space probes some extra batteries, or a big-ass reactor. I appreciate the in-narrative way of ending the story, but again, it just makes the humans look incompetent or uncaring.
• In-story: become a “cyborg with laser cannons for arms and shit”. People were putting magnets inside of themselves years ago, and if they couldn’t die of sepsis, why would they stop there?
• I refuse to believe that nerds did not get together, say “damn, we’re in a post-scarcity economy! What do we do?” and then not build a Niven-class ringworld around the sun. Or re-enact all of Star Wars, but with fully functional ships. Don’t think people would go through the work to do this? I present to you Ren Faires, Civil War re-enactments, and intricate cosplays, which most of these people are doing without a reasonable expectation of living forever.
• Or that someone didn’t sit down and think “man, you know what a random planet needs? A huge ass blue monolith! It’s an artistic statement!” like in Zima Blue.
• Terraform Mars[5], or uplift life on Earth, like in the aptly named Uplift series. Or seed a planet, and try to fast forward evolution[6][7]. We could call it evoball: first one to make a species that can win a football game against humans wins.
• Maybe physics is only local: how can you be sure? What if the Zones of Thought is an actual thing? You can only check by traveling to the center and fringes of the galaxy, which are quite far away. It’s too bad the rules of the universe probably prevent cyrosleep.
• Or in a similar way, you can’t be sure that there are aliens which are more driven than you are. It’s reminiscent of trying to do acausal negotiation, or aliens growing up in a bad neighborhood (Watts short story on belligerence (pdf)Watts on organic Disneylands with no children[8]). However, there is no reason not at least send out astrochickens to make sure.
• If you’re really out of things to do, run timing attacks on the universe (like at the end of Accelerando) just in case we’re in a simulation.

Why doesn’t the author think there will be things to do? Reading the author’s earlier story, The Tim Tebow CFL Chronicles, makes it abundantly clear that the author has confused The Great Stagnation‘s argument of “we’ve picked a bunch of low hanging fruit, so innovation will slow down” with “there will be no further innovation beyond this point”. (If that isn’t what the author was trying to say, sorry, but making two stories in a row about the same thing is how you get labeled “the guy that writes stories with talking cats”). Yes, slow down and smell the roses instead of checking twitter again, but saying “it’s 17776, and we’re bored out of our minds” just ignores so much of what science fiction talks about[9]. Even a series focused solely on pure known physics science fiction, the Borden series of short stories, still comes up with stories worth telling and, eventually, lives worth living.

Instead of doing things, another acceptable answer is attempt to become a bodhisattva, and spend all your time blissing out. Thinking about it, this might be how you could build the same piece of furniture 1000 times in a row, as a meditative exercise. However, the people in this story are not meditation masters: they’re just people desperately ignoring the enormity of the world around them and carrying on with a specific lifestyle brought to them by historical accident[10].

Which leads to another crazy-mirror concept of the story. “We just hang out” is “we just hang out”, and applies just as cleanly to the immortals and us. We killed god in the 1800s, and plagued ourselves with existential ennui and a fear that we’re just wasting time. The only difference is that in the story the god of death has been removed, so actions have even less direction imposed on them. The author answers obliquely by putting in multiple characters coping with living forever by choosing some objective, and then striving for it. According to my understanding, this is also how most people that grapple with “what is my purpose in life?” eventually deal with it. It doesn’t seem like the author likes that answer, but neither does he propose anything else.

# On Children

A related thread hinted at in the story is the complete absence of children paired with effortless immortality. “Man, aren’t children great?” the story sighs. “They would have examined this lawn no one else has examined yet. I really wish we had children, so they could keep our world dynamic and interesting, instead of leaving it staid and boring.”

Which is fair (see Children of Men), but the author is already ignoring the children in front of him.

Admittedly, the children we know about are outside of the solar system, but they’re sentient! And furthermore, they don’t want to kill humans, or kill humans in the process of tiling the universe with atomic smiley faces! They care about football, which is a pretty human thing to do[11]!

So you can make sentient beings by feeding computers enough human culture, and seed their interests with whatever (the probes care about football due to football existing in their payload), and their seed system requirements are 1960s computers. It just takes thousands of years to grow one, which might scale with clock speed. At any rate, it beats being pregnant for 9 months, and having 3 probes become sentient with none of them turning out to be psychotic is a pretty good sign. And since they’re in silicon, they don’t have to only come in probe form, although they can if they want. Having only a few people wanting to take care of new robots should still result in a population explosion (see Down and Out in the Magic Kingdom), especially if even a small fraction of the robots want to hang out in the real world. And with each computational upgrade, the robots would become more like the overminds in The Culture, and the shaping of the world would become their story, not ours. Or, we might end up like Solarians in Asimov’s The Naked Sun, where each person has a cadre of robots and eschews human contact.

Do emulations count as humans? If you record all the electrical activity in the brain at the same time (which should be trivial in a world that already has nanotech), and have good enough physics models on fast enough computers, you can run existing humans in silicon also. Sure, they aren’t children per se, but after spending time copying themselves into clans, their societies will probably seem weird enough that they basically are. (See: Age of EmRevelation Space’s discussion of alpha simulations, The Quantum Thief series)

Can humans simply be printed? Similar to these other suggestions, we know what a human is, and have precision nanotech, so the most brute force thing to do is just take some simulated DNA, stick it in simulated sex cells, and then run the physics models forward until the baby would be born, and then build the baby on a molecular level. Of course there are problems with this: Smalley convinced me years ago that nanotech is fabulously more difficult than nanotech pioneers like Drexler sell it as. However, we’re in 17776, and we’ve already hand waved these problems away with the fiat introduction of nanos.

Both the human-related creation methods, though, probably fall under the purview of “no more human children”, which neatly explains why no one is doing it.

But we’ve already seen that creating more electronic minds is possible: hell, that’s the whole opening conceit. Then, why aren’t there more of them? The unlikely answer is that no one wanted to make them: if nothing else, some enterprising human would figure out how to deliver new electronic minds closely matching human children in android form. The sinister answer is that the same mechanism that prevents conceiving babies prevents the deliberate creation of new minds in general (see the Greg Egan story Crystal Nights).

If that is true, how do you get children again? The answer is simple: kill god (for real this time).

You would solve two goals with one stone: fighting against a fantastically powerful entity means there is no reason to mope around in a facsimile of the 1990s, and if you win you would remove the restrictions placed on humanity. Pining after “true, unfabricated struggle”? You got it.

You can’t kill god, you say? I’ve never liked that people said that god is outside the magisterium of physics, when any link to the theological could be exploited to bring it into physics. Some elaboration: one model of the way we found atomic nuclei is by shooting particles at a thin piece of metal 1 atom wide, and seeing what happened. It’s a lot like throwing billiard balls into a box to figure out what’s in the box by how they bounce back and deflect. So, throw billiard balls at god, and see how they bounce back: the theological consumed by physics (or the other way around). Yes, god is traditionally much more complicated than the atomic structure, but then you could roughly model psychology as throwing (metaphorical) billiard balls at humans. The bottom line? GIT GUD at throwing billiard balls, or GTFO.

And to those that think it’s easy to get to know god, but impossible to move it? It would be giving up too soon to not even try; it’s not clear if they’re in an AI box, and you don’t know you can talk your way out of the box until you try. Better GIT GUD at talking to alien minds.

And if god is watching your thoughts, and changing them as you speak? All I can say is GIT GUD[12], and good luck.

# On Power

I do appreciate that the frame of the story stays the same as our current time, because trying to write post-singularity fiction is a shitshow.

However, it’s not just the fact that life is basically the same that is unbelievable, I also feel like the power structures as is are implausible.

Consider money. The cashier saying “want any money?” is super cute, a great overturning of our expectations about the economy. However, why would society still agree to have money? “Money was a horrifying abstraction that I had to scrape together in the past to make rent, but instead of saying FUCK YOU to money when we could, we decided to keep it around.” What?

Consider religion. “The Wages of Sin are Death”? Not anymore, sucker! Religions are memetic, and the old salvation and morality memes based on an afterlife won’t cut it anymore. What about a religion that preaches “if only the entire world believes, then the curse of eternal life will be lifted, and then we may enter Valhalla”? Or, “the Wages of Sin must be Death: if God has forsaken the world and will not accept us, then we will have to do it ourselves”. And no matter which religions develop, there would be no bumbling missionaries that can barely preach to a crowd of one, because every missionary would have had 10,000 hours of preaching practice many times over.

Consider nationality. In a new world, what Kurd would agree to the Turkey/Iraq borders as they are drawn? What Israeli or Palestinian would agree to the current borders? I’m skeptical about much of Africa keeping its internal shape, with colonial borders drawn willy nilly according to European dictates. Or for an example close to my heart, there were rumblings of splitting Washington state into Washington and Cascadia, to match the cultural divide of the state. In the limit, how can the current nation-states be stable, in the face of a vastly different world? Now, if everyone today had open borders, I would find this description of the future more compelling.

Well, maybe the states no longer actually carry weight: what is there to administer in a post-scarcity society? Well, there is conflict mediation, and there must be conflicts once you can print nuclear bombs. “Your stupid octogonal building is stupid, and you’re stupid!” they say right before nuking said building. Well, you would hope that once you got that old, you would be more gentle and understanding, more wise. We all know that older people are simply not petty, right? Adults could not have been involved in MsScribe. Old people don’t hold on to grudges, or get into inane fights with their neighborhood Home Owner’s Association (or see the spats in Disneyland in Down and Out in the Magic Kingdom). Or someone decides to be artistic and turn all of the Americas into a blank clean white canvas (also see XKCD #861), and hacks the nanos to do the deed. Again, goodbye old building, just this time it’s every old building in North America.

Or to put it in a less violent way: who decides what happens to the original Monets? Sure, scan it and re-constitute it atom for atom: we know elementary particles are interchangeable, so the copies would be effectively the original according to any conceivable test. And when people insist on a particular set of atoms that happen to maximally match our sense of continuity? Post-scarcity removes scarcity from an increasing set of things, but humans insist on keeping some scarce things. Spouse? Accept no substitutes, not conjured companions nor 30,000 grapefruits! Or intangibly, all needs can be met, except for the need for relationships and status and wanting to be the very best, like no one ever was. And we’re going to mediate these conflicts with 20th century states?

I’d expect something closer to The Archipelago, where folks associate with the people they want to associate with. When you don’t have anything but status games to play, why would you play them with people you don’t like, or refuse to play the same status games? “I can recall a million digits of pi and can dunk on you, but you insist you’re better because you can recite all the lines of Sailor Moon and wrestle sharks.” If you squint, it’s just extrapolating from the existing trends: with the internet, we got a fantastic fragmentation of communities, each focused on their thing. Also see The Diamond Age: when it’s possible to just raise an island in the middle of the ocean and go live there with your friends who are weirdly into Victorian era top hats, instead of living next to the people that loudly insist fedoras are far superior, why wouldn’t you?

Back to the story: let’s grant that there’s still power, possibly in a form of a state, possibly in a way that closely approximates 20th century power structures with a president and all that. Let’s say that some authoritarian state made it to 17776 without overthrowing their dictator, but it finally happens in 17777. There’s a lot of pent up frustration with the dictator, but they can’t simply execute him; besides, execution might be too good for him. What do they do?

# On Darkness

So everything’s been fun and games up until now. What would people build with all the free time? Why aren’t there children, even though you can’t have children? (Life, uh, finds a way) Why are the power structures the same?

Well, what could go wrong?

Trigger warnings: torture, mind fuckery, suicide.

Let’s go back to the dictator. What if he was thrown into the sun? He’s obviously going to live, since the rules of the universe enforce that. However, he’s stuck in a 15,000,000C furnace. Depending on exactly how the rules of immortality work, he might be crushed. He would definitely be burning, or if everything except his brain is burned away, then living in enforced solitary confinement with no sensory input. If no one wants to dig him out of the sun, then he could stay there for a very, very long time. (Also see the Priest’s story in Hyperion).

Maybe simple burning for eons on end is too good your enemies. Metamorphosis of Prime Intellect directly tackles this, where the application of endless torture permanently damages at least one person. Of course, the nanos are there to keep humans safe from each other, but all systems can be defeated, and as the tag line of Alien goes, no one can hear you scream in space.

If you stick a pole through someone’s brain, does it give them seizures, or does it maintain their previous mental state, or does the pole simply bounce off? If the powers that be just protect the person against physical assault, something more subtle might work; you can use magnets to induce changes in mind state in people. Watts extrapolates this to maintaining religion in his “A Word for Heathens” short story. Hey presto, Stockholm Syndrome in an MRI! It’s known that brainwashing doesn’t work[13], but things might change when you can actually alter thoughts in flight, or have enough time to experiment with changing the brain chemistry of a person.

Speaking of mind alterations, why are the streets of 17776 so full? They should be emptied by the final drug, wireheading. Just stimulate your reward centers in your brain, and do it endlessly. There would be problems with adapting to the constant stimulation, but I’m sure it could be worked out by 17776. Imagine: you can’t die, but you’re bored. You’ve played football for ten thousand years in a row, and five thousand years ago you were ready to die, having lived a full life. But the kids a street over are talking about a way out. You’ll live forever, and you won’t care, because you’ll be maximally blissed out. Once you wirehead, you won’t decide “man, my life kind of sucks, I should do something else” because nothing would suck, forever. And if even a tiny proportion of people each year decide to wirehead, over time the wireheading population subsumes the human population (see this fictional supporting report for Echopraxia). Eventually, everyone will be smiling, and they won’t be creatures of play, they’ll be creatures of happiness[14], forever.

Or maybe the powers that be decide that these outcomes are too horrifying to allow, so “artificial” modification by electrical or chemical or crowbar means is disallowed. Well, we have depressed people that we help with drugs: are they denied their mind altering chemicals? Did this god doom schizophrenics to an eternity of delusions? Is there some population off-screen that can only lay in bed, hopeless for either positive help or the sweet release of death?

Perhaps you understand now what I mean when I say that The Future of Football is the singularity for noobs. Compared to existing sci-fi options the story is kind of bland, where nothing exciting nor nothing too terrible has happened. It’s great for beginners, though, who haven’t really grappled with living forever or being in a high tech post-scarity society, and need that “see, the future isn’t too wild, but why not think about these ideas?” headfake to get them to consider it[15].

Again, it’s a fine story; it doesn’t deserve a moniker like “a story about for those that haven’t thought about the far future before, and won’t think about the far future again”. However, I think it does function best as a gateway drug into a whole universe of science fiction all excitedly dialoging about the products of our accumulated imaginations.

[1]  Of course it’s American football.

[2]  Pun alert: in the author’s earlier Tim Tebow CFL Chronicles, he refers to the images as paintings, when they’re just images processed with Photoshop filters.

However, this is similar to the econo-art idea I had. It’s derived from eco-rounds in CounterStrike, where players leverage lower-cost equipment to save up for a buy round. Econo-art is just low-cost art, which is just good enough to get the point across. You could spend lots of time making a single beautiful pre-photography realism style painting, or you could apply some Photoshop filters and finish the story in a reasonable time. Maybe more on this in the future.

[3]  “Didn’t you enjoy Harry Potter and the Methods of Rationality?” Well, I also enjoyed this story, so…

[4]  Possible counter: there used to be a lot more variation, but we’ve killed most of it as collateral damage in making the world legible. Thank the gods for global street-by-street GPS navigation, but we’ve lost our cheeses, and I don’t think they’re coming back in 3 days (but see these comments for discussion).

[6]  Fast forward evolution? For example, Seveneves muses on using epigenetic flexibility in order to adapt organisms more quickly to changing environments. This specific example probably can’t be made to work, even for 17776 societies, but there’s probably a lot of study to be brought to bear on genetics.

[7]  There’s an open question about whether running evolution is ethical, since most evolution involves actual death, and probably lots of suffering. I’ll let the 17776 ethics board weigh in on that.

[8]  I broke my back lifting Moloch to heaven, and all I got was a Disneyland with no children. If consciousness turns out to be extraneous, then it might get weeded out in intense competition, which is bad news for us people that value being conscious.

[9]  It doesn’t help that the author sketches a game labeled “capitalism run amok” with a critical tone (which looks an awful lot more like 4chan run amok). Sure, reject creative destruction (which, to be fair, becomes a lot less horrifying when you can no longer freeze to death on the streets), but then complaining that no one is making things to do is somewhat incongruous.

[10]  I also think the argument that “people rejected things that caused them to lose their connection to humanity” implies a different end state than the author paints. Why stop at the 1990s? Why not dribble it all the way back to the savanna, a sort of uber-Amish lifestyle that puts us back into the long childhood of humanity?

Then there’s the argument that people want things to stay the same, citing people that get older and keep everything the same. It’s certainly poetic, “Plates and portraits… would leave unbleached shadows of themselves in the paint”. However, I think the author is ignoring that as people get older, they break down: everything starts to hurt, mental agility declines. Why would they move the couch when they would throw out their back? Let’s reverse aging, and then see what the old folks do.

[11]  Future AI researchers that might be reading this, please do not use “love of football” as an AI friendliness metric. It will go poorly.

[12]  First, a horror story about ideas that cannot be thought, and ideas that can eat you alive. Sound similar?

As far as we know, there’s no telekill material in the universe. What could we plausibly do? One way to try and combat mind reading is to first scan your mind into a computer, and then homomorphically encrypt the scan, and then run it forwards with homomorphic encryption operations to simulate physics while feeding in things about the world. That way you can “think” about the problem without making it easy for god. Sure, once god notices, it would look for the encryption keys, or would keep watch for malicious thoughts joined with thoughts about homomorphic encryption, but these are both a bit harder than just looking for a mind thinking about overthrowing god. If you cannot win, and refuse to lose, impose costs.

[13]  I remember reading this from a semi-trusted source, but now I can’t find it, and can only find articles conveying “lol are your children being BRAINWASHED into a CULT?”.

[14]  My impression is that you will find wireheading abhorrent. “I almost felt transcendent joy. It was awful.” What matters is not that you find it abhorrent now, but whether you will always consider it abhorrent over the next 15,759 years. Without ironclad norms against wireheading, people will eventually try it.

[15]  Associated idea: future shock levels. It’s from 1999, which means that it’s woefully out of date, but the general idea still holds.

Filed under: Uncategorized

## Thoughts on My Tribe

Epistemic status: feelings and intuitions.

I’m an aspiring rationalist[1], and I count myself as a part of my local rationality interested community.

And it’s wonderful that the community is here! I can confidently say that if it weren’t, I would be less the sort of person I want to be[2]. It introduced me in quick succession to lots of intelligent people, a series of thoughtful ideas, and immersed me in an infectious self-improvement environment. In a more hands on way, it gave me valuable first lessons in people management when I found myself growing into the defacto leader of a rationalist group house. And, it gave me a people I could call my people[3].

But lately, the community has been dragging on my soul.

The drag is low-grade apprehension, because our defacto leader is leaving for that galactic attractor, The Great Bay Hole[4]. We’ve seen this story before[5]; one person steps up to run things, making sure that meetups happen and generally keeping up the community. Unfortunately, there are two ways this falls apart: first, the sort of person that becomes an energetic charismatic leader tends to reason themselves into a corner that requires them to move to The Bay so they can Save The World[6]. Or, if there’s something keeping them from moving to The Bay, then whatever that is can suddenly require more from them, so the person has to load shed, and leading the community will go out the window before whatever the Bay-Blocking Important Thing is. Either way, they end up leaving after a stint as the local community leader, leaving a vacuum of responsibility, which usually one person to steps up to fill…

This sort of arrangement can be sustainably unsustainable, if there’s enough new energetic people that stick around for a few years before abdicating their position. However, there isn’t currently a clear energetic charismatic leader candidate. The people with babies? The people that will have babies soon? The people busy with school? The people busy with work? The people with unfortunate amounts of anxiety? Me?

could talk at length about the different ways gardening[7] the community is a thorny proposition: I agonized over a few drafts of this post that were all about those difficulties[8]. However, most of the musing was quite abstract, and after thinking about it I realized that while most of my concerns were relevant, they weren’t ultimately important: they didn’t get at the heart of why I felt apprehensive.

The heart of my apprehension is a fear of ending up alone, becoming the sole person putting in non-trivial effort to keep the dream alive. If I pick up the mantle, then it becomes difficult to put back down; don’t I have a responsibility to the community that helped me so much? I should just suck it up and focus more and more energy on the management of the community. And then one day I’ll find myself muttering “somebody has to, and no one else will”, the same thing I internal monologued while burning myself out running the group house, and a phrase I firmly believe should be reserved for profoundly tragic figures, not your everyday run of the mill humans[9].

Yes, dropping responsibilities on the floor was/is/will always be an option: the global BATNA[10] has never been more amenable. But being the last one turning out the lights is sad, like I personally doomed my people to astronomical rents[11], horrific public transportation[12], a boring culture, and pleasant year-round weather. And when I consider the possible outcomes, it’s failure that weighs on my mind. Better to never try, instead of putting in a heroic effort and then seeing it all fall apart in the end anyways.

Well, when you put it that way the counter is obvious: don’t focus solely on the negative outcomes, duh. My counter-counter is wrapped up in the sprawling unpublished essay[13] on my expected cost/benefit for community gardening: high cost, potentially high reward with high variability. Against this uncertainty, I have a menagerie of personal hopes and dreams, a todo list the length of my arm with little of it directly tied to running a community. Is it worth it for me to step up into the energetic charismatic leader[14] role? To put it mildly, I’m uncertain, and it doesn’t help that even when I try to plug my ears, I still hear the doom and gloom rolling in over the community.

This story has a tentatively happy ending.

A recent[15] meetup tried to figure out what the group would be doing, and several people stepped up to take on temporary shared responsibility, with more people tentatively waiting in the wings if things fell through. We’ve tried a similar leadership sharing scheme before, which failed after a brief stint, but we haven’t tried it more than once and in this particular configuration, which makes this “an interesting experiment” and not “the definition of insanity”.

Yes, really, the fact that I physically saw a handful of people willing and able to help, not including myself, really upped my probability[16] that things could keep functioning, and not on Ye Olde Single Energetic Leader model we’ve been chugging forward with. I know from my experiences with the group house that foisting everything on one person produces less work overall, since there are no communication costs. However, not paying the initial costs to make sure people can provide extra capacity means any bump in workload is really a bump in workload for one person, who can’t delegate because the scaffolding isn’t in place. Plus, there’s no redundancy. Therefore, it’s worth the upfront costs to spread the work around, which makes this shared responsibility scheme exciting[17].

It’s not “everything is easy now, and nothing could possibly go wrong”; there are still real costs, and the problems to overcome are still difficult[18]. It’s more about putting to rest the feeling that “I’m the last line, and here I will make my stand alone” and transmuting it into “if I’m part of a last line, then at least I won’t be alone”. Which isn’t the sort of thing you don’t alieve[19] until you see it moving into action, with the ideas and commitment rolling in.

Maybe everything isn’t hopeless bullshit. We’ll keep flying this plane, and with a little elbow grease, maybe we’ll fly it into the last sunset.

[1]  If you’re not familiar, it’s the sort of new wave rationalism original based out of LessWrong (notably a shadow of its former self now), not the sort of enlightenment rationalism that insisted the world had to make sense, and damn it humanity had a duty to change the world if it didn’t conform. As it turns out, this old-school worldview runs into problems, which we hope to avoid.

[2]  Keeping in mind that being part of the community has most certainly altered my idea of what an ideal version of myself looks like.

[3]  Scott Alexander just recently talked about this, explaining it’s his karass, and I’m sure it’s my karass as well well.

[4]  The global community started with an unusual number of people in The Bay, and because we’re not really on board with inefficiencies, obviously moving to the place with all your online friends makes sense. Once you’ve moved, it makes even more sense for your friends to move to The Bay…

[5]  Simplified account is simplified.

[6]  If it’s not Save The World, it might even be as simple as “it’s easier to run my startup there” or “all the people I want to collaborate with are there”.

[7]  “Gardening a community” is a nice way to formulate community growth, which I’ve stolen from Scott’s “In Favor of Niceness, Community, and Civilization”.

[8]  In abbreviated form, what I think makes gardening the community hard: our standards are high, so putting together content is daunting. We’re concerned about using rationality instrumentally, but our interests are varied, so it’s hard to get enough people together to put things to practice on the same target. Doing original work is difficult, since a lot of the low hanging fruits have been picked. We’re drawn from contrarian-heavy populations. Relatedly, we value truth over conformity, even if it makes things more inefficient. Generally, the modern community BATNA means it’s easier to leave groups with difficult problems, even if they are also important problems. Management of a community is not the same thing at all as dealing with whatever the community is focused on, so management is a chore instead of something that comes naturally. Personally, I have truncated social needs, so I would be okay as a Seder/Solstice rationalist (analog with Christmas/Easter Christian). I also think I’m missing a formative experience of exploratory collaboration that the community facilitated, which would help me feel that the community is important.

[9]  There are things worth doing this for, like challenging hell, but no matter how you cut it “running a meetup” is not in the same reference class.

[10]  Best Alternative To Negotiated Agreement (BATNA). It’s never been easier to simply leave; there’s are groups looking for members everywhere, and you’re not stuck in one village your entire life.

[11]  I recognize that saying this from the NYC metropolitan area is lol worthy, but at least we have a proper city.

[13]  The aforementioned first drafts of this post are basically that essay.

[14]  I’d have to work on the charisma. And the energetic also, probably. And, well, if we’re being honest, the leader part too…

[15]  It’s not-so-recent by this point; this post is like 2-3 months on a timely issue, which doesn’t really work. Well, something for me to works towards.

[16]  I was also surprised by the extent to which I was moved by the social proof of people earnestly discussing things in a room.

[17]  I also recognize that if I have too much of a hand in designing this sort of organizational scaffolding, I’d probably be prone to second system effects. Something to watch out for.

[18]  We’re not even talking about the really hard sorts of problems, like solving climate engineering, nuclear proliferation, or intelligence foom scenarios. They’re much more mundane, like “what should we talk about next week?” or “how many game nights per month is too much?”. Solving the mundane problems will hopefully help progress on the harder problems, but we’ll have to see how that pans out.

[19]  Useful concept alert: you know something, you believe it. But do you feel it in your gut, do you alieve it?

Filed under: Uncategorized
1 Comment »

## Sandbox Statistics: Methodological Insurgency Edition

Epistemic status: almost certainly did something egregiously wrong, and I would really like to know what it is.

I collect papers. Whenever I run across an interesting paper, on the internet or referenced from another paper, I add it to a list for later perusal. Since I'm not discriminating in what I add, I add more things than I can ever hope to read. However, "What Went Right and What Went Wrong": An Analysis of 155 Postmortems from Game Development (PDF) caught my eye: an empirical software engineering process paper, doing a postmortem on the process of doing postmortems? That seemed relevant to me, a software engineer that does wrong things once in a while, so I pulled it out of paper-reading purgatory and went through it.

The paper authors wanted to study whether different sorts of game developers had systematic strengths and weaknesses: for example, they wanted to know whether "a larger team produces a game with better gameplay." To this end, they gathered 155 postmortems off Gamasutra[1] and encoded each one[2] as claiming a positive or negative outcome for a part of the game development process, like art direction or budget. They then correlated these outcomes to developer attributes, looking for systematic differences in outcomes between different sorts of developers.

I'll be upfront: there are some problems with the paper, prime ofwhich is that the authors are a little too credulous given the public nature of the postmortems. As noted on Hacker News, the companies posting these postmortems are strongly disincentivised from actually being honest; publicly badmouthing a business partner is bad for business, or airing the company dirty laundry is bad for business, or even saying "we suck" is bad for business. Unless the company is going under, there's little pressure to put out the whole truth and nothing but the truth, and instead a whole lot of pressure to omit the hard parts of the truth, maybe even shade the truth[3]. It's difficult to say that conclusions built on this unstable foundation are ultimately true. A second problem is the absence of any discussion of statistical significance; without knowing if statistical rigor was present, we don't know if any conclusions drawn are indistinguishable from noise.

We can't do much about the probably shaded truth in the source material, but we might be able to do something about the lack of statistical rigor. The authors graciously publicized their data[4], so we can run our own analyses using the same data they used. Of course, any conclusions we draw are still suspect, but it means even if I screw up the analysis, the worst that could happen is some embarrassment to myself: if I end up prescribing practicing power poses in the mirror or telling Congress that cigarettes are great, no one should be listening to me, since they already know my source data is questionable.

Now we have a sandbox problem and sandbox data: how do we go about finding statistically significant conclusions?

# p-values

Before we dive in, a quick primer about p-values[5]. If you want more than this briefest of primers, check out the Wikipedia article on p-values for more details.

Roughly speaking, p-values are the chance that a null hypothesis, the boring, no interesting effect result, is true given the data we see. The lower the p-value is, the more likely a non-boring outcome is.

For example, if we're testing for a loaded coin, our boring null hypothesis is "the coin is fair". If we flip a coin 3 times, and it comes up heads twice, how do we decide how likely it is that a fair coin would generate this data? Assuming that the coin is fair, it's easy to see that the probability of a specific sequence of heads and tails, like HTH, is (\frac{1}{2})^3 = \frac{1}{8}. We need to use some combinatorial math in order to find the probability of 2 heads and and 1 tail in any order. We can use the "choose" operation to calculate that 3 \text{ choose } 2 = {{3}\choose{2}} = 3 different outcomes match 2 heads and 1 tail. With 3 coin flips, there are 8 equally probable outcomes possible, so our final probability of 2 heads and 1 tail in any order is 3/8.

However, neither of these are the probability of the coin being fair. Intuitively, the weirder the data, the less weight we shoul give to the null hypothesis: if we end up with 20 heads and 2 tails, we should be suspicious that the coin is not fair. We don't want to simply use the probability of the outcome itself, though: ending up with one of 100 equally probable outcomes is unremarkable (one of them had to win, and they were all equally likely to win), while ending up with an unlikely option instead of a likely option is remarkable. By analogy, receiving 1 inch of rain instead of 1.1 inches in Seattle is unremarkable, even if getting exactly 1 inch of rain is unlikely. Receiving any rain at all in the Sahara Desert is remarkable, even if it's the same probability as getting exactly 1 inch of rain in Seattle. The weirdness of our data depends not just the probability of the event itself, but the probability of other events in our universe of possibility.

The p-value is a way to solidify this reasoning: instead of using the probability of the outcome itself, it is the sum of the probability of all outcomes equally or less probable than the event we saw[6]. In the coin case, we would add the probability of 2 heads and 1 tail (3/8) with the probability of the more extreme results, all heads (1/8), for p=0.5.

But wait! Do we also consider a result of all tails to be more extreme than our result? If we only consider head-heavy results in our analysis, that is known as a one-tailed analysis. If we stay with a one-tailed analysis, then we will in essence be stating that we knew all along that the coin would always have more heads in a sequence, and we only wanted to know by how much it did so. This obviously does not hold in our case: we started by assuming the coin was fair, not loaded, so tails-heavy outcomes are just as extreme as heads-heavy outcomes and should be included. When we do so, we end up with p=1.0: the data fits the null hypothesis closely[7]. One-tailed analysis is only useful in specific cases, and I'll be damned if I fully understand those cases, so we'll stick to two-tailed analyses throughout the rest of this post.

If there were only ever two hypotheses, like the coin being fair, or not, then rejecting one implies the other. However, note that rejecting the null hypothesis says nothing about choosing between multiple other hypotheses, like whether the coin is biased towards the head or tail, or by how much a coin is biased. Those questions are certainly answerable, but not with the original p-value.

How low a p-value is low enough? Traditionally, scientists have treated p<0.05 as the threshold of statistical significance: if the null hypothesis were true, it would generate data this extreme less than 1/20th of the time purely by chance, which is pretty unlikely, so we should feel safe rejecting the boring null hypothesis[8].

There are problems with holding the p<0.05 threshold as sacrosanct: it turns out making p=0.05 a threshold for publication means all sorts of fudging with the p-value (p-hacking) happens[9], which is playing a part in psychology's replication crisis, which is where the 2nd part of this post's title comes from[10].

For these reasons, the p-value is a somewhat fragile tool. However, it's the one we'll be using today.

The first step is simple: before looking at any of the data, can we know whether any conclusions are even possible?

The first step would be to do a power analysis, and find out whether 155 postmortems is enough data to produce significant results. First, we need to choose an expected effect size we think our data will display: usual values range from 0.1 (a weak effect) to 0.5 (a strong effect). Yes, it's subjective what you choose. We already know how many data points we have, 155 (normally we would be solving for this value, to see how big our sample size would have to be). Now, I'm not going to calculate this by hand, and instead use R, a commonly used statistical analysis tool (for details on running this, see the appendix below). Choosing a "medium" effect size of 0.3 with n=155 data points tells us that we have a projected 25% false negative rate, a ~1/4 chance to miss an existing effect purely by chance (see the appendix for more details about running the analysis). It's not really a great outlook, but we can't go back and gather more data, so we'll just have to temper our expectations and live with it.

What about looking at other parts of the general experiment? One potential problem that pops out is the sheer number of variables that the experiment considers. There are 3 independent variables (company attributes), and 22 dependent variables (process outcomes) that we think the independent variables affect, for a total of 3\cdot 22=66 different correlations that we are looking at separately. This is a breeding ground for the multiple comparisons problem: comparing multiple results against the same significance threshold increases the chances that at least one conclusion is falsely accepted (see this XKCD for a pictorial example). If you want to hold steady the chances that every conclusion you accept is statistically significant, then you need to make the evidential threshold for each individual correlation stricter.

But how much more stricter? Well, we can pick between the Bonferroni, the Sidak, and the Holm-Bonferroni methods.

The Bonferroni method simply takes your overall threshold of evidence, and divides by the number of tests you are doing to get the threshold of evidence for any one comparison. If you have m=5 tests, then you have to be 5 times as strict, so 0.05/5 = 0.01. This is a stricter restriction than necessary: however, it's easy to calculate, and it turns out to be a pretty good estimate.

The Sidak method calculates the exact overall threshold of evidence given the per-comparison threshold. The previous method, the Bonferroni, is fast to calculate, but it calls some outcomes insignificant when it in fact has enough evidence to label those outcomes as significant. The Sidak method correctly marks those outcomes as significant, in exchange for a slightly more difficult calculation. The equation is:

p_{comparison} = 1 - (1 - p_{overall})^{1/m}

There's some intuition for why this works in a footnote [11].

If p_{overall}=0.05 (as is tradition) and m=5, then p_{comparison}=0.0102. This is not that much less strict than the Bonferroni bound, which is simply p_{Bonferroni}=0.01, but sometimes you just need that extra leeway.

The Holm-Bonferroni method takes a different tack: instead of asking each comparison to pass a stringent test, it asks only some tests to pass the strict tests, and then allows successive tests to meet less strict standards.

We want to end up with an experiment-wide significance threshold of 0.05, so we ask whether each p-value from low to high is beneath the threshold divided by its number in line, and stop considering results significant once we reach a p-value that doesn't reach its threshold. For example: let's say that we have 5 p-values, ordered from low to high: 0.0001, 0.009, 0.017, 0.02, 0.047. Going in order, 0.0001 < 0.05/5 = 0.01, and 0.009 < 0.05/4 = 0.0125, but 0.017 > 0.05/3 = 0.0167, so we stop and consider the first two results significant, and reject the rest.

There is a marvelous proof detailing why this works which is too large for this post, so I will instead direct you to Wikipedia for the gory details.

With these methods, if we wanted to maintain a traditional p=0.05 threshold with m=66 different comparisons, we need to measure each individual comparison[12] against a p-value of:

p_{Bonferroni}=0.000758
p_{Sidak}=0.000777
p_{Holm}=(\text{between } 0.000758 \text{ and } 0.05)

We haven't even looked at the data, but we're already seeing that we need to meet strict standards of evidence, far beyond the traditional 0.05 threshold. And with n=155 data points at best (not all the postmortems touch on every outcome), it seems unlikely that we can meet these standards.

Perhaps I spoke too soon, though: can the data hit our ambitious p-value goals?

# Testing the data

So how do we get p-values out of the data we have been given?

Keep in mind that we're interested in comparing different proportions of "this went well" and "this went poorly" responses for different types of companies, and asking ourselves whether there's any difference between the types of companies. We don't care about whether one population is better or worse, just that they have different enough outcomes. In other words, we're interested in whether the different populations of companies have the same proportional mean.

We'll use what's known as a contingency table to organize the data for each test. For instance, let's say that we're looking at whether small or large companies are better at doing art, which will produce the following table:

 Small Company Large Company Good Art 28 16 Bad Art 12 6

We want to compare the columns, and decide whether they look like they're being drawn from the same source (our null hypothesis). This formulation is nice, because it makes obvious that the more data we have, the more similar we expect the columns to look due to the law of large numbers. But how do we compare the columns in a rigorous way? I mean, they look like they have pretty similar proportions; how different can the proportions in each column get before they are too different? It turns out that we have different choices available to determine how far is too far.

## z-test, t-test

The straightforward option is built in to R, called prop.test. Standing for "proportionality test", it returns a p-value for the null hypothesis that two populations have the same proportions of outcomes, which is exactly what we want.

However, a little digging shows that there are some problematic assumptions hidden behind the promising name. Namely, prop.test is based on the z-test[13], which is built on the chi-squared test, which is built on the assumption that large sample sizes are available. Looking at our data, it's clear our samples are not large: a majority of the comparisons are built on less than 40 data points. prop.test handily has an option to overcome this, known as Yates continuity correction, which corrects p-values for small sample sizes. However, people on CrossValidated don't trust Yates, and given that I don't understand what the correction is doing, we probably shouldn't either.

Instead, we should switch from using the z-test to using the t-test: Student's t-test makes no assumptions about how large our sample sizes are, and does not need any questionable corrections. It's a little harder to use than the z-test, especially since we can't make assumptions about variance, but worth the effort.

## Fischer

However, the t-test still makes an assumption that the populations being compared are drawn from a normal
distribution
. Is our data normal? I don't know, how do you even see if binary data (good/bad) is normal? It would be great if we could just sidestep this, and use a test that didn't assume our data was normal.

It turns out that one of the first usages of p-values matches our desires exactly. Fischer's exact test was devised for the "lady tasting tea" experiment, which tested whether someone could tell whether the milk had been added to the tea, instead of vice versa[14]. This test is pretty close to what we want, and has the nice property that it is exact: unlike the t-test, it is not an approximation based on an assumption of normal data.

Note that the test is close, but not exactly what we want. The tea experiment starts with by making a fixed number of cups with milk added, and a fixed number of cups with tea added. This assumption bleeds through into the calculation of the p-value: as usual, Fischer's test calculates the p-value by looking at all possible contingency tables that are "more extreme" (less probable) than our data, and then adding up the probability of all those tables to obtain a p-value. (The probability of a table is calculated with some multinomial math: see the Wikipedia article for details). However while looking for more extreme tables it only looks at tables that add up to the same column and row totals as our data. With our earlier example, we have:

 28 16 =44 12 6 =18 =40 =22

All the bolded marginal values would be held constant. See the extended example on Wikipedia, especially if you're confused how we can shuffle the numbers around while keeping the sums the same.

This assumption does not exactly hold in our data: we didn't start by getting 10 large companies and 10 small companies and then having them make a game. If we did, it would be unquestionably correct to hold the column counts constant. As it stands, it's better to treat the column and row totals as variables, instead of constants.

## Barnard

Helpfully, there's another test that drops that assumption: Barnard's test. It's also exact, and also produces a p-value from a contingency table. It's very similar to Fischer's test, but does not hold the column and row sums constant when looking for more extreme tables (note that it does hold the total number of data points constant). There are several variants of Barnard's test based on how exactly one calculates whether a table is more extreme or not, but the Boschloo-Barnard variant is held to be always more powerful that Fischer's test.

The major problem with Barnard is that it is computationally expensive: all the other tests run in under a second, but running even approximate forms of Barnard take considerably longer. Solving for non-approximate forms of Barnard with both columns and rows unfixed take tens of minutes. With 66 comparisons to calculate, this means
that it's something to leave running overnight with a beefy computer (thank the gods for Moore's law).

You can see the R package documentation (PDF) for more details on the different flavors of Barnard available, and all the different options available. In our case, we'll use Boschloo-Barnard, and allow both rows and columns to vary.

## Outcomes

So now we have our data, a test that will tell us whether the populations differ in a significant way, and a few ways to adjust our p-values to account for multiple comparisons. All that remains is putting it all together.

When we get a p-value for each comparison, we get (drum roll): results in a Google Sheet, or a plain CSV.

It turns out that that precisely 1 result passes the traditional p=0.05 threshold with Barnard's test. This is especially bad: if there was no effect whatsoever, we would naively expect 66 \cdot 0.05 \sim 3 of the comparisons to give back a "significant" result. So, we didn't even reach the level of "spurious results producing noise", far away from our multiple comparison adjusted thresholds we calculated earlier.

This is partly due to such a lack of data that some of the tests simply can't run: for example, no large companies touched on their experience with community support, either good or bad. With one empty column, none of the tests can give back a good answer. However, only a few comparisons had this exact shortcoming; the rest likely suffer from a milder version of the same problem, where there were only tens of data points on each side, which doesn't produce confidence in our data, and hence higher p-values.

In conclusion, there's nothing we can conclude, folks, it's time to pack it up and go home.

# p-value Pyrotechnics

Or, we could end this Mythbusters style: the original experiment didn't work, but how could we make it work, even if we were in danger of losing some limbs?

In other words, the data can't pass a p=0.05 threshold, but that's just a convention decided on by the scientific community. If we loosened this threshold, how far would we have to loosen it in order to have a statistically significant effect in the face of multiple comparisons and the poor performance of our data?

It turns out that reversing Bonferroni correction is impossible: trying to multiply p=0.023 (the lowest Barnard-Boschloo p-value) by 66 hands back 0.023 \cdot 66 \sim 1.5, which is over 1.0 (100%), which is ridiculous and physically impossible. The same holds for Holm-Bonferroni, since it's built on Bonferroni.

So let's ditch Barnard-Boschloo: the t-test hands back a small p-value in one case, at 5.14 \cdot 10^{-6}. This we can work with! 5.14 \cdot 10^{-6} \cdot 66 = 0.000339, far below 0.05. This is pretty good, this outcome even passes our stricter multiple-comparisons adjusted tests. But what if we wanted more statistically valid results? If we're willing to push it to the limit, setting p_{overall}=0.9872 gives us just enough room to snag 3 statistically significant conclusions, either with Bonferroni or Holm-Bonferroni applied to the t-test results. Of course, the trade-off is that we are virtually certain that we are accepting a false positive conclusion, even before taking into account that we are using p-values generated by a test that doesn't exactly match our situation.

Reversing Sidak correction gets us something saner: with 66 tests and our lowest Barnard-Boschloo p-value, p=0.023, we have an overall 1-(1-0.023)^{66}=p_{overall}=0.785. Trying to nab a 2nd statistically significant conclusion pushes p_{overall}=0.991. Ouch.

This means that we can technically extract conclusions from this data, but the conclusions are ugly. A p=0.785 means that if there is no effect in any of our data, we expect to see a at least one spurious positive result around 75% of the time. It's worse than a coin flip. We're not going to publish in Nature any time soon, but we already knew that. Isn't nihilism fun?

# Conclusions

So, what did we learn today?

• How to correct for multiple comparisons: if there are many comparisons, you have to adjust the strictness of your tests to maintain power.
• How to compare proportions of binary outcomes in two different populations.

At some point I'll do a Bayesian analysis for the folks in the back baying for Bayes: just give me a while to get through a textbook or two.

Thanks for following along.

# Appendix: Running the Analysis

If you're interested in the nitty gritty details of actually running the analyses, read on.

For doing the power analysis, you want to install the pwr package in R. In order to run a power analysis for the proportion comparison we'll end up doing, use the pwr.2p.test function (documentation (PDF)), and use n=155 data points and a "medium" effect size (0.3). The function will hand back a power value, which is the inverse of the false negative rate (1-\text{"false negative rate"}). If you want to do a power analysis for other sorts of tests not centered around comparing population proportions, you will need to read the pwr package documentation for the other functions it provides.

Now on to the main analysis…

The "Codes" sheet contains all the raw data we are interested in. Extract that sheet as a CSV file if you want to feed it to my scripts. The "Results" sheet is also interesting in that it contains what was likely the original author's analysis step, and makes me confident that they eyeballed their results and that statistical power was not considered.

Second, we need to digest and clean up the data a bit. To paraphrase Edison, data analysis is 99% data cleaning, and 1% analysis. A bit of time was spent extracting just the data I needed. Lots of time was spent defending against edge cases, like case rows not all having the same variable values that should be the same, and then transforming the data into a format I better understood. There are asserts littering my script to make sure that the format of the data stays constant as it flows through the script: this is definitely not a general purpose data cleaning script.

You can check out the data cleaning script as a Github gist (written in Python).

This data cleaning script is meant to be run on the CSV file we extracted from the xlxs file earlier (I named it raw_codes.csv), like so:

python input_script.py raw_codes.csv clean_rows.csv 

The actual data analysis itself was done in R, but it turns out I'm just not happy "coding" in R (why is R so terrible?[15][16]). So, I did as much work as possible in Python, and then shipped it over to R at the last possible second to run the actual statistical tests.

Get the Python wrapper script, also as a Github gist.

Get the R data analysis script used by the wrapper script, also as a Github gist.

The R script isn't meant to be invoked directly, since the Python wrapper script will do it, but it should be in the same directory. Just take the CSV produced by the data cleaning step, and pass to the wrapper script like so:

python analysis.py clean_rows.csv \
--t_test --fischer_test \
--barnard_csm_test \
--barnard_boschloo_test 

This produces a CSV analysis_rows.csv, which should look an awful lot like the CSV I linked to earlier.

Math rendering provided by KaTeX.

[1] The video game community has a culture that encourages doing a public retrospective after the release of a game, some of which end up on Gamasutra, a web site devoted to video gaming.

[2] The authors tried to stay in sync while encoding the postmortems to make sure that their each rater's codings were reasonably correlated with each other, but they didn't use a more rigorous measure of inter-rater reliability, like Cronbach's alpha.

[3] Even if the company is going under, there are likely repercussions a no-holds barred retrospective would have for the individuals involved.

[4] It turns out Microsoft wiped the dataset supposedly offered (probably due to a site re-organization: remember, it's a shame if you lose any links on your site!), but thankfully one of the authors had a copy on their site. Kudos to the authors, and that author in particular!

[5] This is also your notice that this post will be focusing on traditional frequentist tools and methods. Perhaps in the future I will do another post on using Bayesian methods.

[6] One of the curious things that seems to fall out of this formulation of the p-value is that you can obtain wildly different p-values depending on whether your outcome is a little less or a little more likely. Consider that there are 100 events, 98 of which happen with probability 1/100, and one that happens with probability 0.00999 (event A), for 0.01001 remaining probability on the last event (event B). If event A happens, p=0.00999, but if event B happens, p=1.0. These events happen with mildly different probabilities, but lead to vastly different p-values. I don't know how to account for this sort of effect.

[7] This is kind of a strange case, but it makes sense after thinking about it. Getting an equal number of heads and tails would be the most likely outcome for a fair coin (even if the exact outcome happens with low probability, everything else is more improbable). Since we're flipping an odd number of times, there is no equals number of heads and tails, so we have to take the nex best thing, an almost equal number of heads and tails. Since there's only 3 flips, the most equal it can get is 2 of one kind and 1 of another. Therefore, every outcome is as likely or less so than 2 heads and a tail.

[8] However, note that separate fields will use their own p-value thresholds: physics requires stringent p-values for particle discovery, with p=0.0000003 as a threshold.

[9] This wouldn't be such a big deal if people didn't require lots of publications for tenure, or accepted negative results for publication. However, we're here to use science, not fix it.

[10] Reminder: I'm almost certainly doing something wrong in this post. If you know what it is, I would love to hear it. TELL ME MY METHODOLOGY SINS SO I CAN CONFESS THEM. It's super easy, I even have an anonymous feedback form!

[11] So why does the Sidak equation have that form?

Let's say that you are trying to see Hamilton, the musical, and enter a lottery every day for tickets. Let's simplify and state that you always 1 out of 1000 people competing for one ticket, so you have a 0.001 chance of winning a ticket each day.

Now, what are the chances that you win at least once within the next year (365 days)? You can't add the probability of winning 365 times: if you extend that process, you'll eventually have more than 100% chance of winning, which simply doesn't make sense. Thinking about it, you can never play enough times to guarantee you will win the lottery, just play enough times that you will probably win. You can't multiply the probability of winning together 365 times, since that would be the probability that you win 100 times in a row, an appropriately tiny number.

Instead, what you want is the probability that you lose 365 times in a row; then inverting that gets you the probability that you win at least once. The probability of losing is 0.999, so 365 \cdot 0.999 = 0.694. But we don't want the probability of losing 365 times in a row: we want the chance that doesn't happen. So we invert by subtracting that probability from 1, 1-0.694, for a total probability of winning equal to 0.306.

Generalizing from a year to any number of days N, this equation calculates the total probability of winning.

p_{total} = 1 - (1 - p_{winning})^N

Which looks an awful lot like the Sidak equation. The exponent contains a N instead of a \frac{1}{m}, since p_{total} corresponds with p_{overall} in the Sidak equation: solving for p_{winning} will net you the same equation.

[12] An unstated assumption throughout the post is that each measure of each variable is independent of each other measure. I don't know how to handle situations involving less-than-complete independence yet, so that's a topic for another day. This will probably come after I actually read Judea's Causality, which is a 484 page textbook, so don't hold your breath.

[13] The manual page for prop.test was not forthcoming with this detail, so I had to find this out via CrossValidated.

[14] It's adorable how Victorian the experiment sounds.

[15] Allow me to briefly rant about R's package ecosystem. R! Why the fuck would you let your users be so slipshod when they make their own packages? Every other test function out there takes arguments in a certain format, or a range of formats, and then a user defined package simply uses a completely different format for no good reason. Do your users not care about each other? Do your dark magicks grow stronger with my agony? Why, R!? Why!?

[16] I suppose I really should be using pandas instead, since I'm already using python.

Filed under: Uncategorized

## Tools I Use

I’ve been thinking about whether the tools I use to get things done are good enough. Where are the gaps in my toolset? Do I need to make new tools for myself? Do I need to make tools that can make more tools[1]?

Before diving too deep, though, I thought it would be helpful to list out the tools I use today, why I use them, and how I think they could be better. It’s a bit of a dry list, but perhaps you’ll find one of these tools is useful for you, too.

# Getting Things Done

## Habitica/HabitRPG

Say what you will about gamification, but when it works, it works.

I wasn’t a habitual child, adolescent, or young adult. I had the standard brush/floss teeth habit when going to sleep, and nothing much beyond that. Sure, I tried to cultivate the habit of practicing the violin consistently, but that culminated with only moderate success in my early college years.

Then I picked up HabitRPG (now Habitica) in 2014, and suddenly I had to keep a central list of habits up to date on a daily basis, or I would face the threat of digital death. Previous attempts at holding myself to habits would track my progress on a weekly basis, or fail to track anything at all, but the daily do-or-die mentality built into Habitica got me to keep my stated goals at the forefront of my mind. Could I afford to let this habit go unpracticed? Am I falling into this consistent pattern of inaction which will get me killed in the long run? It was far from a cure-all, but it was a good first step to getting me to overcome my akrasia and do the things that needed to be done[2].

Currently, I only use the daily check-in features (“Dailies”): at first I also used the todo list, but it turned out that I wanted much, much more flexibility in my todo system than Habitica could provide, so I eventually ditched it for another tool (detailed below). I simply never got into using the merit/demerit system after setting up merits and demerits for myself.

## org-mode

I have tried making todo lists since I was a young teenager. The usual pattern would start with making a todo list, crossing a couple items off it over a week, and then I would forget about it for months. Upon picking it back up I would realize each item on the list was done, or had passed a deadline, or I didn’t have the motivation for the task while looking at the list. At that point I would throw the list out; if I felt really ambitious in the moment, I would start a new list, and this time I wouldn’t let it fade into obsolescence…

Habitica fixed this problem by getting me into the habit of checking up on my todo list on a regular basis, which meant my todo lists stopped getting stale, but the todo list built into the app was just too simple: it worked when I had simple one-step tasks like “buy trebuchet from Amazon” on the list, but complicated things like “build a trebuchet” would just sit on the list. It never felt like I was making forward progress on those large items, even when I worked for hours on it, and breaking up the task into parts felt like cheating (since you get rewarded for completing any one task[3]), but more importantly it made my todo list long, cluttered, and impossible to sort. Additionally, I wanted to put things onto the list that I wanted to do, but weren’t urgent, which would just compound how cluttered the list would be. For scale, I made a todo spreadsheet in college that accumulated 129 items, and most of which weren’t done by the end of college and would have taken weeks of work.

So I needed two things: a way to track all of the projects I wanted to do, even the stupid ones I wouldn’t end up doing for years, and a way to track projects while letting me break them down into manageable tasks.

After a brief stint of looking at existing todo apps, and even foraying into commercial project management tools, I decided I was a special unique flower and had to build my own task tracker, and started coding.

After weeks of this, one of my friends started raving about org-mode, the flexible list-making/organization system built inside of Emacs (the text editor; I talk about it some more below). He told me that I should stop re-implementing the wheel: since I was already using Emacs, why not just hack the fancy extra stuff I wanted from a todo system on top of org-mode, instead of tediously re-implementing all the simple stuff I was bogged down in? So I tried it, and it’s worked out in exactly that way. The basics are sane and easy to use, and since it’s just an Emacs package, I can configure and extend it however I want.

Like I implied earlier, I use my org-mode file as a place to toss all the things that I want to do, or have wanted to do; it’s my data pack-rat haven. For example, I have an item that tracks “make an animated feature length film”[4], which I’m pretty sure will never happen, but I keep it around anyways because the peace of mind I can purchase with a few bytes of hard drive space is an absolute bargain. It doesn’t matter that most of my tasks are marked “maybe start 10 years from now”, just that they’re on paper disk and out of my head.

And like I implied earlier, org-mode really got me to start breaking down tasks into smaller items. “Build a trebuchet” is a long task with an intimidating number of things to do hidden by a short goal statement; breaking it down into “acquire timber” and “acquire chainsaw” and “acquire boulders” is easier to think about, and makes it clearer how I’m making progress (or failing to do so).

The last big feature of org-mode that I use is time tracking, allowing me to track time to certain tasks. I do a weekly review, and org-mode lets me look at how I did certain tasks, and for how long. For example, I used to think that I wrote blog posts by doing continual short edit/revision cycles, but it turned out that I usually had the revision-level changes nailed down quickly, but then I had long editing cycles where I worried about all the minutia of my writing. Now I’m more realistic about how much time I spend writing, and how quickly I can actually write, instead of kidding myself that I’ll be happy with just an hour of editing[5].

Org-mode isn’t for everyone. It only really works on desktop OS’s (some mobile apps consume/edit the org-mode file format, but only somewhat), so it’s hard to use if you aren’t tied to a desktop/laptop. And the ability to extend it is tied up in knowing an arcane dialect of lisp and a willingness to wrestle with an old editor’s internals. And you might spend more time customizing the thing than actually getting things done. But, if you’re bound to a desktop anyways, and know lisp, and have the self discipline to not yak shave forever, then org-mode might work for you.

## Inbox

Nothing out of the ordinary here, it’s just Google email. Aside from handling my email, I primarily use the reminders feature: if there are small recurring tasks (like “take vitamins”), then I just leave them in Inbox instead of working them into org-mode. At some point they’ll probably move into org-mode, but not yet.

## Keep / Evernote

I started using Evernote from 2011 or so, and switched to Keep last year when Evernote tried to force everyone to pay for it. Originally, I bought into the marketing hype of Evernote circa 2011: “Remember Everything”. Use it as your external brain. Memorizing is for chumps, write it down instead.

And I took the “Everything” seriously. How much did I exercise today? What did I do this week? What was that interesting link about the ZFS scrub of death? Why did I decide to use an inverted transistor instead of an inverted zener diode in this circuit? It’s all a search away.

I recognize that this level of tracking is a bit weird, but recalling things with uncanny precision is helpful. For example, while I was doing NaNoWriMo in November, I had years of story ideas and quips as notes; if I sort of half-remembered that I had an idea where Groundhog Day was a desperate action movie instead of a comedy, I could just look up what sorts of plot directions I had been thinking about, or if I had more ideas about the plot over time, and bring to bear all that pent up creative energy.

Less importantly, I use my note taking stream as a mobile intake hopper for org-mode, since there aren’t any mobile org-mode apps I trust with my todo list.

## Habit Group

And for something that isn’t electronic: I am part of a habit setting and tracking group. It’s a group of like-minded individuals that all want to be held accountable to their goals, so we get together and tell each other how we are doing while striving towards those goals. It’s using social pressure to get yourself to be the person you want to be, but without the rigid formality of tools like Stickk.

# Mobile Apps

## Anki

A spaced repetition app, free on Android. See Gwern for an introduction deep dive on spaced repetition.

I use it to remember pretty random things. There’s some language stuff, mainly useful for impressing my parents and niece with how easily I can pronounce Korean words. There’s some numbers of friends and family, in case I somehow lose my phone and find a functioning payphone. There’s a subset of the IPA alphabet, in case I need to argue about pronunciation.

I have some more plans to add to this, but mostly covering long-tail language scenarios. If you’ve read Gwern’s introduction above, you’ll remember that the research implies that mathematical and performance knowledge are not as effective to memorize through spaced repetition as language and motor skills, so I’m not really in a rush to throw everything into an Anki deck.

This is your reminder that if you’re not using two-factor authentication, you really should be. Two factor means needing two different types of things to log in: something you know (a password) and something you have (a phone, or other token). This way, if someone steals your password over the internet, you’re still safe if they also don’t mug you (applicable to most cybercriminals).

## Feedly

For reading RSS feeds. I follow some bloggers (SSC, Overcoming Bias), some science fiction authors (Stross, Watts), and the Tor.com short story feed.

However, Feedly isn’t especially good. The primary problem is the flaky offline support. Go into a tunnel? There’s no content cache, so you can’t read anything if you didn’t have the app open at the exact moment you went underground. (I imagine this is mostly a problem in NYC).

Plus, the screens are broken up into pages instead of being in one scrolling list, which is weird. It’s okay enough to get me to not leave, but I’m on the look out for a better RSS reader.

## Swarm

Location check-in app, throwing it back to 2012. Sure, it’s yet another way to leak out information about myself, like whether I’m on vacation, but governments and ginormous companies already can track me, so it’s more a question of whether I want to track myself. Swarm lets me do that, and do it in a way that is semantically meaningful instead of just raw long/lat coordinates.

My trusty e-reader, which I’ve written about before. It currently runs stock firmware, but I recently learned about an exciting custom firmware I had missed, koreader, which looks like it solves some of the PDF problems I had bemoaned before. We’ll see if I can scrounge up some time to check it out.

# Desktop Software

## Emacs

Text editor Operating system. What org-mode is layered on top of. If you’re clicking around with a mouse to move to the beginning of a paragraph so you can edit there, instead of hitting a couple of keys, you’re doing it wrong.

Also make sure to map your caps lock key to be another control, which is easily one of the higher impact things on this list that you can do today, even if you will never use Emacs. Now, you don’t have to contort your hand to reach the control keys when you copy-paste, or when you issue a stream of Emacs commands.

## Ubuntu

Running 16.04 LTS, with a ton of customization layered on top. For example, I replaced my window manager with…

Tiling window manager for Linux. All programs on the desktop are fully visible, all the time. This would be a problem with the number of programs I usually have open, but xmonad also lets you have tons of virtual desktops you can switch between with 2 key-presses. I suspect that this sort of setup covers at least part of the productivity gains from using additional monitors.

Caveat for the unwary: like org-mode, xmonad is power user software, which you can spend endless time customizing to an inane degree (to be fair, it’s usually a smaller amount of endless time than org-mode).

## Redshift

Late night blue light is less than ideal. Redshift is a way to shift your screen color away from being so glaringly blue on Linux.

There are similar programs for other platforms:

However, the default behavior for most of these apps is to follow the sun: when the sun sets, the screen turns red. During the winter the sun sets at some unreasonable hour when I still want to be wide awake, so there’s some hacking involved to get the programs to follow a time-based schedule instead of a natural light schedule.

## Crackbook/News Feed Eradicator (Chrome extensions)

I’m sure you’re aware of how addictive the internet can be (relevant XKCD). These extensions help me make sure I don’t mindlessly wander into time sinks.

I use Crackbook by blocking the link aggregators I frequent, hiding the screen for 10 seconds: if there’s actual content I need to see, or if I’m deliberately relaxing, then 10 seconds isn’t too much time to spend staring at a blank screen. But if I just tabbed over without thinking, then those 10 seconds are enough for second thoughts, which is usually enough to make me realize that I’ve wandered over by habit instead of intention, and by that point I just close the tab.

The News Feed Eradicator is pretty straightforward: it just removes Facebook’s infinite feed, without forcing a more drastic action, like deleting your Facebook. For example, it’s easy for me to see if anyone had invited me to an event[8], but I don’t get sucked into scrolling down the feed forever and ever.

This will not work for everyone: some people will go to extreme lengths to get their fix, and extensions are easy to disable. However, it might work for you[9].

# Things I Made To Help Myself

I made a personal tool to create the monthly/quinannual/annual newsletters I send to the world. It’s my hacked up replacement for social networking.

Throughout the month/year/life, I keep the tool up to date with what’s happening, and then at the end of the month it packages everything up and sends it in one email. It’s not strictly necessary, since I could just write out the email at the end of the month/year, but it feels like less of a time sink, since I’m spreading the writing out over time instead of spending a day writing up a newsletter, and that means I’m willing to spend more time on each entry.

## Writing Checker Tool

There are a number of writing checkers out there: some of them aren’t even human.

There’s the set of scripts a professor wrote to replace himself as a PhD advisor. There are some folks that are working on a prose linter (proselint, appropriately), which aims to raise the alarms only when things are obviously wrong with your prose (“god, even a robot could tell you ‘synergy’ is bullshit corporate-speak!”). There have been other attempts, like Word’s early grammar checker, and the obvious spellchecker, but they all stem from trying to automate the first line of writing feedback.

My own script isn’t anything exciting, since it uses other scripts to do the heavy lifting, like the aforementioned proselint and PhD scripts. So far the biggest thing I added to the linter is a way to check markdown links for doubled parentheses, like [this link](https://en.wikipedia.org/wiki/Solaris_(2002_film)): unless the inner parentheses are escaped with \, the link won’t include the last ), probably preventing the link from working, and a dangling ) will appear after the link.

There are more things I plan on adding (proper hyphenation in particular is a problem I need to work on), but I’ve already used the basic script for almost every blog post I’ve written in 2016. Notably, it’s helping me break my reliance on the word “very” as a very boring intensifier, and helped me think closely about whether all the adverbs I strew around by default are really necessary.

# Real Life

## The 7 Minute Workout

Exercising is good for you, but it wasn’t clear to me how I should exercise. Do I go to the gym? That’s placing a pretty big barrier in front of me actually exercising, given that gyms are outside and gym culture is kind of foreign to me. Do I go running? It’s a bit hard to do so in the middle of the city, and I’ve heard it’s not good for the knees[10]. Places to swim are even harder to reach than gyms, so that’s right out.

What about calisthenics? Push ups, sit ups, squats and the like. It requires barely any equipment, which means I can do it in my room, whenever I wanted. While thinking about this approach, I came across the 7 minute workout as detailed by the NY Times. Is it optimal? Certainly not; it won’t build muscle mass quickly or burn the most calories[11]. Is it good enough, in the sense of “good is the enemy of perfect”? Probably! So I started doing the routine and have been doing it for 3.5 years.

I’ve made my own tweaks to the routine: I use reps instead of time, use dumbbells for some exercises, and swapped out some parts that weren’t working. For example, I didn’t own any chairs tall enough to do good tricep dips on, so I substituted it with overhead triceps extensions.

And, well, I haven’t died yet, so it’s working so far.

## Cleaning Checklist

After reading The Checklist Manifesto, I only made one checklist (separate from my daily Habitica list, which I was already using), but I have been using that checklist on a weekly basis for more than a year.

It’s a cleaning checklist. I use it to keep track of when I should clean my apartment, and how: not every week is “vacuum the shelves” week, but every week is “take out the trash” week. It has been helpful for making sure I don’t allow my surroundings to descend into chaos, which was especially helpful when I lived alone.

## Meditation and Gratitude Journaling

Meditation I touch on in an earlier blog post; it builds up your ability to stay calm and think, even when your instinct rages to respond. Gratitude journaling is the practice of writing down the things and people you are grateful for, which emphasizes to yourself that even when things are bad, there’s some good in your life.

I’m wary about whether either of these actually work, or are otherwise worth it, but lots of people claim they do, and to a certain extent, they feel like they do. In a perfect world I would have already run through a meta-analysis to convince myself, but I don’t know how to do that yet, so I just do both meditation and gratitude journaling; they’re low cost, so even if they turn out to not do anything it’s not too big a loss.

## Book/Paper Lists

It’s not just “I read this, on this date”: I also keep track of whether I generally recommend them, and a short summary of what I thought of the book, which is helpful when people ask whether I recommend any books I read recently. On the flipside, I also use the list as a wishlist to make sure I always have something interesting to read.

That’s it for now! We’ll see how this list might change over the next while…

[1] And when I do make tools that make tools, should it be a broom or bucket?

[2] Obviously, this won’t work for everyone. If you’re not motivated by points and levels going upwards, but the general concept appeals to you, Beeminder might be more motivating, since it actually takes your money instead of imaginary internet points.

[3] Conceivably, you could make this work by creating tasks to take a certain amount of time (like 30 minutes) so each item is time based instead of result based, and treat that as Just The Way You Use The Habitica Todo List.

[4] Don’t worry, it’s more fleshed out than this: I’m not keen on doing something for the sake of doing something, like “write my magnum opus, doesn’t matter what it’s about”. Come on, it has to matter somehow!

[5] It’s certainly possible that I should try to edit faster, or move towards that short and repeated revise-edit cycle, but this is more about having a clear view of what I’m actually doing now, after which I can decide how I should change things.

[6] If you use the same password everywhere, then your password is only as secure as the least secure site you use. Suppose you use the same password at your bank and InternetPetsForum, and InternetPetsForum hasn’t updated their forum software in 12 years. If InternetPetsForum is hacked, and your password was stored without any obfuscation, the hackers are only a hop and skip away from logging into your bank account, too.

[7] I’m declining to state exactly which password manager to use; while security through obscurity isn’t really viable for larger targets, I’ve picked up enough residual paranoia that disclosing exactly which service/tool I use seems to needlessly throw away secrecy I don’t need to throw away.

[8] lol

[9] And if you want something that’s less easy to disable, then SelfControl or Freedom might be more your speed. I can’t personally vouch for either.

[10] Honestly not really a true objection, but saying “running is hard” makes me feel like a lazy bum. I already did 20 pushups, what more do you want?!

[11] If you are interested in optimality in exercise, I’ve heard good things about Starting Strength.

Filed under: Uncategorized