NOTE: yes, I’m getting around to this 2 weeks after the fact. I have long-lived drafts, okay?
The EFF being who they are, issued an article alerting us to this state of events and how to ameliorate it, by deleting our web history. However, looking over it, I noticed that this is a ton of data. These records go all the way back to 2005; in other words, it’s a treasure trove of personal data. And you ought to know that I’m somewhat obsessed about personal metrics, and I’m not going to just straight-up delete perfectly good data if I can’t get my hands on it.
The problem is that Google didn’t offer a nice way to download the data: there’s an RSS feed, but that means stepping through an RSS feed manually, since I couldn’t just generate a feed with 29k searches in it: instead, I had to step through by 1000 items, and who would download 30 xml files? Obviously, this was a job for lazy-in-some-ways programmer.
After reading up on it, though, it still seemed feasible, and I spun up a git repo and started writing a Chrome extension. Then, I ran into a small snag: how should I make sure I don’t download and save more files than needed? So I tried to see if asking for an RSS feed too far into the past would just pass back a 404. Instead, I found that Google had somehow squashed my 29k searches into 5 XML files, instead of 30.
However, it might have balanced out to the best: I learned how to work basic Chromium extensions, and I also got my data down, safe and sound. I suppose there’s a moral in here somewhere… oh! In the title! Check your assumptions, fail fast, etc etc.
Methinks I have to work on my fable telling.