[Review] Citizen Miami

Okay, I’ve put this off long enough. I’ve been meaning to post something about this for a semester and a half now, so I should just shove this out the door.

If you’re interning in the valley, bikes are cool: the culture is bike friendly, and you’re likely not getting paid enough in a short time to justify getting a car. And, need I point out that walking everywhere sucks? So by process of elimination, bikes are cool.

For some reason, I opted for a folding bike, for some reason I can’t remember now. I got a Miami model from Citizen Bikes, both of which worked out. However, I could have just as easily gotten a normal bike for cheaper on craigslist, and the folding feature of the bike really only saved me once or twice on the train (you can board the Caltrain with a folding bike, even if all the bike slots are taken). Also, folders are not very fast, even when you try to pedal at top speed at the highest gear. There’s definitely a convenience-speed tradeoff there.

However, it did get me where I needed to go, and it didn’t break the bank. So a note to you prospective valley interns, give some thought about getting a bike.

Check your Assumptions

NOTE: yes, I’m getting around to this 2 weeks after the fact. I have long-lived drafts, okay?

If you haven’t been keeping up with tech news in a particularly frantic way, you might not have noticed that Google is changing their privacy policy. On one hand, this change will result in a somewhat longer policy, but without the fragmentation across all of Google’s services, so that one need not read 5 different policies to use 5 different Google services. On the other hand, this means that data is share-able across application boundaries, such that the bountiful web searches can be used to customize which youtube videos are shown to a person, and vice versa.

The EFF being who they are, issued an article alerting us to this state of events and how to ameliorate it, by deleting our web history. However, looking over it, I noticed that this is a ton of data. These records go all the way back to 2005; in other words, it’s a treasure trove of personal data. And you ought to know that I’m somewhat obsessed about personal metrics, and I’m not going to just straight-up delete perfectly good data if I can’t get my hands on it.

The problem is that Google didn’t offer a nice way to download the data: there’s an RSS feed, but that means stepping through an RSS feed manually, since I couldn’t just generate a feed with 29k searches in it: instead, I had to step through by 1000 items, and who would download 30 xml files? Obviously, this was a job for lazy-in-some-ways programmer.

There was some prior work, but I threw it out for dumb reasons (CSV?? Flash?? It’s like we’re back in the stone ages!) (Also, don’t tell curl I didn’t choose it) Plus, I wanted to learn how to work Chrome extensions, and by just thinking about it, it seemed like I could inject javascript into the history page, use AJAX calls to get the subsequent RSS feed files (since the injected javascript would side-step the same-origin policy), and save it using the HTML5 FileSystem API. But would this all work? Just because you work it out in your head doesn’t mean it should work, especially with technologies one is not familiar with.

After reading up on it, though, it still seemed feasible, and I spun up a git repo and started writing a Chrome extension. Then, I ran into a small snag: how should I make sure I don’t download and save more files than needed? So I tried to see if asking for an RSS feed too far into the past would just pass back a 404. Instead, I found that Google had somehow squashed my 29k searches into 5 XML files, instead of 30.

Oh.

Now, downloading 5 files does not sound like much work: neither does 30, in the grand scheme of things, but 30 somehow manages to be more fearsome with the mere addition of a brotherly digit. Cut down to size, I just downloaded and saved the XML files containing essentially my entire search history, packed into a little over a megabyte of data. Also, it was 30 minutes until midnight, and tarrying wouldn’t do me good (of course, the privacy policy might have already shifted, say, when midnight appeared over the International Date Line, but nevermind…) (also, I couldn’t find where Chromium actually, you know, saved the File data).

However, it might have balanced out to the best: I learned how to work basic Chromium extensions, and I also got my data down, safe and sound. I suppose there’s a moral in here somewhere… oh! In the title! Check your assumptions, fail fast, etc etc.

Methinks I have to work on my fable telling.

[Review] Accelerando

After I finished Accelerando (yes! it’s free (as in beer)!) I could not stop the thought that sums it all up from cycling over and over through my mind.

That was a mindfuck through my eyesockets.

And it was awesome. Now, we happen to be on the internet: there’s nigh infinite amount of mindfuckery at our disposal, but this was a cream of the crop mindfuck, the fine aged mindfuck grown in rustic rolling hills of Cthulu’s domain, uncorked in the presence of distinguished guests you actually like.

Truly, it’s been a while since I’ve read a visionary piece that’s also not totally mushy sci-fi, possibly not since reading some Cory Doctorow (especially his story “0wnz0red”), and boy, is it a heady drink. The term future-shock gets bandied about in the story, but they forget to break the fourth wall and address the reader, who is almost certainly future-shocked from being thrown into an unremembered past at hundreds of years per hour, all without leaving the same room.

It’s not just a mindfuck: it’s also managed to change my mind about what I should be doing in the near future (which puts it on par with HPMOR). Robots are still cool, but metacorticies? Might want to get that out of the way first.

Also, Accelerando includes galaxy-wide civilizations that run timing attacks on the substrate of the universe. Really, how awesome is that?

Enough with the gushing praise: what about the story? As a vision of the technical singularity, it takes the stance that one can indeed see past the intelligence explosion (contrary to other singularity theories), but only by following non-uploaded humans skirting past the growing Matrioshka brain. It is somewhat strange, since it posits uploads running the show (starting with lobsters) until they self-modify into something entirely non-human, supposing that uploads are easy to modify. However, knowing that we are godshatter, it can’t possibly be that easy, so the Less Wrong FAI vision seems more likely. But no one ever ask for likelihood with their mindfuck, did they?

I’d also like to compare Accelerando to The Forever War, another fairly hard sci-fi book (maybe 8 on the mohs scale?) that I recently read (during fall semester). I believe The Forever War was written before computing systems became fixtures in everyday life and uploads were an actual distinct possibility, Haldeman putting a clone hive-mind in control of humankind instead of Matrioska brains the size of the solar system. However, The Forever War boils down to an old fashioned love story, whereas Accelerando defenestrates that trope post-haste and instead shoves unalloyed future into your face.

And having read more works by Charlie Stross, I can say that he’s going to jostle with Cory Doctorow in my pantheon of sci-fi authors. Or maybe head higher.

I highly recommend you go get this vision of the future. Happy mindfucking!

Note: Stross doesn’t think the scenarios he details in Accelerando are very likely. However, it’s a good ride.

[Review] Visualizing Data

If you are not already acquainted with the works of Edward Tufte, you ought go fix that, especially if you happen to be a younger me that could use a badassery-boost later in life. Basically, Tufte encourages thought about data display beyond the usual “plot and run away” touted by most anyone, chides those that try deceiving with displays, and fires the imagination with wondrous examples.

However, Tufte writes for data mongers that are crafting displays for a popular audience. What if your audience is scientific, sophisticated, and yet still looking for that “interocular impact”? Then, as an Amazon reviewer once pointed out to me, maybe Visualizing Data by William Cleveland could help you.

Certainly, the target audience of Visualizing Data is related to the scientific community, eschewing any amount of glitz for effective visualization techniques (as does Tufte) with a solid grounding in math (nevermind). Instead of searching for visualization techniques that are only suited for a few scenarios (such as Minard’s Napolean chart), Cleveland takes a clinical approach of developing a few nearly orthogonal techniques and extending them into ever-increasing dimensions (the loess is a particularly intruiging example).

With the troop further into the book and into higher dimensions, also comes different ways of looking at the data, such as stereographic techniques for viewing 3d data which is interesting in itself. Notably, Cleveland also develops methods for finding subtle dependencies in the data in these higher dimensions, bypassing the intuitive but flawed solution of merely dishing out 2 two-dimensional plots.

This all boils down to Cleveland’s main mission of battling “rote data analysis”, the application of statistical techniques without applied critical thought. The bulk of the book is taken with application of the various tools developed to real-world data sets, augmenting the case studies with comments on how to go about the analysis effectively. He also compares and contrasts his results with previous studies done on the very same data sets by people likely to have used rote data analysis, as a non-too-subtle indication of the sort of fate awaiting those using least-squares without thinking. Overall, the style is instructive and helpful, and quite the pedagogical win.

However, I should note thta I said earlier that Tufte was meant for data mongers creating displays for the general public, this implies that Cleveland is not so suited to the same. Indeed, tools such as the q-q, r-f, or m-d plots are not standard in the public visual literacy, such that I had to read the explanation of q-q plots a second time before grasping what was going on. To generalize, you would use Cleveland’s methods to convince your colleagues, and Tufte-like displays to convince everyone else.

While I borrowed this book as part of my “read all the books in the library” campaign, I may be compelled to buy this book anyways. If you’re in the business of playing with data (not necessarily machine learning domains, because trying to display hundreds of dimensions just will not work), then I highly recommend checking this book out.

Devfest 2012

The 2nd annual Devfest concluded… half a week ago, now. It was also my last, but nevermind that (you mean lounging around at student hackathons is NOT a viable post-academia career path??)?

You can already find out the whats and whens and wheres of Devfest, and perhaps the hows and whys if you tie up and interrogate enough ADI Committee members, so that’s not so interesting. What you ought to be interested in are thoughts, probably mine, about the event(s). I mean, that’s why you guys are here, right? For access to poorly filtered mind dumps and rambling incoherent monologues, and that’s before we even narrow it down from the internet at large. It’s either that or cats, so if you’re here for the other thing you should probably go somewhere else.

Overall, the quality definitely lifted from last year, mostly from knowledge gained from all the mistakes made last year. In particular, we got very good audience retention rates for lectures and workshops across the week with the introduction of focused theme nights, and having the Demofest on Saturday was a big win (overlap with Super Bowl: bad). That, and prizes were a nice addition, although it is a bit of a double-edged sword (partly mitigated by a relatively large spread of prizes).

As for mistakes, we of the ADI committee have an entire document dedicated to preserving our mistakes for future generations to squint at and ignore. So instead of musing on those, I’ll talk about some larger, architectural things that are offputting to myself.

First, Devfest is not scalable in it’s current form. In trying to include non-programmers (which we failed at, with notable exceptions) by extending the length out to a week, we get an event that excludes most everyone outside of the “Columbia Bubble“: contrast this with something like Pennapps, which almost consumes more than a weekend but makes it possible for students from across the upper-eastern seaboard to attend (disclaimer, I’ve never been to Pennapps. Sadface). If we want to grow past the impact we’ve been having, then we either need to convert more people to a Devfest-ready audience, or convert to a geographically scalable model (actually, the former is probably much more desirable, in terms of our mission. But I digress).

Second, it turns out that the ADI committee is too large. I used to think that we should fit whatever people we could into our (ADI’s) maw, but now I see firsthand what it means to have communication difficulties scale with the square of the group size. Now, I have nary a clue what many committee members are doing, or if they were even involved with our flagship event, and I would probably leave a person or two out of an enumeration of the committee members off the top of my head. What I would like is a metacortex a la Accelerando (which I’ll review soon) shared across the entire committee, but we know this isn’t going to be possible for a while, so I should get off my ass and make it happen.

It is going to be interesting if we take on new members: we probably hooked a few people’s interest with that Devfest stunt. If we keep the rigid subcommittee structure, we could use some augmentation in the less-populous subcommittees. However, I do have an idea that would put myself in more contact with more people across the committee, which is reorient the entire committee to more of a task-force driven model, where each event gets its own task force drawn out of the committee pool, probably combined with a blind system while playing to people’s strengths. On the other hand, I’m typing this while it’s 4am, so there’s a strong chance this is just crazy talk.

As for the projects, I guess the only one that really held my attention was Sid’s Scheduler (although the seating algorithm by Aditya and Zack was a close second), mostly because it somewhat jibes with my thoughts of how to kickstart my metacortex and make it useful from the start.

Oh, and my own project? I was (and still am) working on a replacement for IRC with regards to ADI: IRC is inherently stateless and persistent-identityless and an inheritor of old command traditions while lacking new ones (twitter tags and mentions come immediately to mind), which means that it’s essentially unusable for a loosely connected group that’s also pretty geographically synchronous (compared to standard companies that guarantee engagement for 8 hours a day). Add in difficult engagement with newbies, and you have a mess. Hence, something that fixed that, while also piling all sorts of smart digests/notifications on top. And when you decide to learn a new technology with it, then that seals it. However, I only got to the status quo of chat services over the span of Devfest, and didn’t end up presenting something boring. I do think this project is important, though, so I’ll probably get it to a version 1 at some point (MVP doesn’t really make sense here, since there are plenty of ready alternatives. They’re just not *perfect*).

But Devfest? It’s the little things: chilled applesauce, refactoring code, teaching someone the basics of python in 5 minutes, throwing k-means at a problem and calling it good.

And as the night is shot through with blue, I swear I’ll never do this again. But I always seem to come back.

21 – … and a new beginning.

A biting cold, pressing crowds, a barrage of sound, hints of snow filtering past enormous buildings, a touch of music, a place to lay a tired head.

Hello again, New York. I’m back.

20 – The End

And here I stand on a cusp, peering down into a squall . I myself, my core, will make it through: just as certainly, I won’t make it through unscathed, emerging a changed person.

For the better?

Let’s make it so.

19 – And Upward, Ho

A contrived world I find myself fallen into: a little light filters down through the haze, barely illuminating brazenly artificial structures that I must scale, to reach the overworld once more. My current guide brings forth a feeble light, and leads the way.

Why this particular task is a rite of passing for our culture, I cannot say: glimpses of the overworld afforded to us show me and my brethren who also undertake this task that the mountains we will climb after exiting this cave are natural, lacking the sharp, geometric corners and flat faces that make up these artificial counterparts.

When we started at the very bottom of these mountain ranges, oh so very long ago, we were afforded more leeway, the first guides carrying bigger lanterns. But how slowly they went, and as future guides strode with more urgency, carrying smaller lamps, one had to learn to stay on the path, lest one be lost in the gloom. In between thoughts of where to place another step, perhaps a small thought registers an interesting piece of architecture passing by before it is swallowed by the twilight. If thoughts wander too far, then I am liable to slip, slip off the peaks and cliffs to who knows where.

Not only is an uncertain end reason to keep to the straight and narrow: if I reach the goal first, then I win. Everyone has told me so: my parents, every one of my guides, and even my peers would offer up grudging admiration for having outrun them with stoic determination. To what end is the winning aimed? I do not know, but surely it is worth it.

And, up ahead! A flash of light, which my guide tells me is coming from the machinery that drives the lift at the goal. The lift that will lift me and my peers to the overworld. It looks to come from a place not far off: I only have to put down my head and trod forward a bit more, and it will be over.


An allegory for academia, into which I couldn’t fit everything I wanted to say. Hence, this addendum.

First, the bad: it’s not as easy as throwing yourself off a cliff and finding there’s a portal to the overworld that catches you. No, your parents had to pay coin to allow your passage through the trials, and saddle you with some of the debt, so if you take the easy way out you still have the associated debt without the prestige that comes from weathering the trials.

Then, the good: when I leave school, I’ll also be leaving behind a nice, structured life that didn’t require much of me outside of studying and trying to maximize my point spread (extracurriculars, though…). In particular, I’ll be leaving behind friends of goodly caliber, and the density of awesome people that made meeting all of them possible. These are the good things I’ll be leaving behind, which I’ll have to work to combat (less the first than the second).

Finally, I don’t think Universities are optimizing for teaching: there’s not a tight feedback loop in which the teachers get to know which methods have the greatest impact, and are moved to use those methods. This might be changing, but it’s gonna get ugly before it gets better.

And that is why I am looking forward to leaving academia.

[15-18] – Okay, what?

You know what, I just wrote a ton of stuff covering some projects, like dod3catgraph and pensievr, and also documenting my time at Redwood Systems. I’m counting that as four days of writing, because it is a ton of writing, and unfortunately I’m the one holding the reigns in this particular regime. I still have to put something out for today, though, so heads up for a post soon.

14 – Metapost: Oh god when will it end

Another week, another meta-review.

  • No, I’m not going to do a meta-meta post.
  • A Year in Review

    I have qualms about using lists as a lazy shortcut for miles of prose: however, I am a programmer, and as such will take lazy shortcuts whenever (including now). Thinking back to the writing of the post, I mostly pulled the events populating the list previous blog posts, which is biased: it does get most of the major events, though. I don’t think breaking up the year-end review and the resolutions was a good idea (could be interesting interplay between the two), but honestly, I did want to fill up some more posts. I was also probably sleepy.

  • Resolutions

    Short list, which is good (few resolutions to keep track of) and bad (I was probably sleepy when I wrote them), or maybe really bad (I only gave the list a few minutes of thought). Oh well, it’s not like a binding list of promises: also, I would like to do these things.

  • Semester in Review

    Exposition, exposition, exposition about academics and class reviews that would be much better put on culpa.

  • The Craft

    This was written in the full swing of my work on Pensievr, which also explains why it was 3 lines. If I kept writing, then my dad probably would have woken up and found me still writing.

  • Late for a Job?

    This one was borne out of the fact I was mostly done with Pensievr at some ungodly hour of the morning (similar to the last post) and something that had been on my mind. It’s strange, writing down thought processes that flash through your mind in what feels like half a second…

  • Winter Break Movie Reviews

    I had a draft sitting around, and I don’t plan on watching more movies. The reviews aren’t worthy of being called movie critiques, but that also wasn’t the point. However, it also constricts the population to whom the writing is interesting.

Overall: Still not writing fast enough. Not much to say about that: I might be getting faster, though, mostly by relying on a stream of consciousness style of writing. I’m also still writing around midnight: I’ve only managed to publish one of these posts before midnight. I should fix that sometime.