Tape is HOW expensive?

Maybe you've seen that hard drive prices aren't falling so quickly. Maybe you've seen the articles making claims like "tape offers $0.0089/GB!"[1], looked at recent hard drive prices, and seriously thought about finally fulfilling the old backup adage "have at least 3 backups, at least one of which is offsite" with some nice old-school tape[2].

So you'd open up a browser to start researching, and then close it right afterwards in horror: tape drives prices have HOW many digits? 4? The prices aren't even just edging over $1000, it's usually solidly into the $2000s, or higher. Maybe then you start thinking about just forking all your money to The Cloud™ to keep your data.

But maybe it's worth taking a look and seeing exactly how the numbers work out. As an extreme example, if you can buy a $2000 device that gives you infinite storage, then that is a really interesting proposition[3]. Of course, the media costs for tape aren't zero, but they are cheaper than the equivalent capacity in hard drives. Focusing in, the question becomes: when does the lower cost of each additional tape storage overcome the fixed costs of tape, such that tape systems become competitive with hard drives?

Some background: tape formats are defined by the Linear Tape-Open Consortium (LTO)[4], which periodically defines bigger and better interoperable tape formats, helpfully labled as LTO-N. Each jump in level roughly corresponds to a doubling of capacity, such that LTO-3 contains 400GB/tape while the recent LTO-8 contains 12TB/tape.

And some points of clarification:

  • LTO tapes usually have two capacity numbers; for example, LTO-3 tapes usually advertise themselves as being able to contain 400 or 800GB. If you're lucky, the advertising material will suffix "(compressed)" sotto voce, notifying you that the 800GB number is inflated by some LTO blessed pie-in-the-sky compression factor. Ignore this, just look at the LTO level numbers and their uncompressed capacity.
  • We usually talk about hard drives as a single unit (if you can see the individual hard drive platters, that means you are having a bad problem and you will not be storing data on that drive today), but tape is more closely related to the floppy/CD drives of yore, where media is freely exchangable between drives.

First, I gathered some hard numbers on cost. I trawled Newegg and Amazon for drives and media for each LTO level from 3 to 8, grabbing prices for the first 3 drives from each source and 5 media from each. Sometimes this wasn't possible, like for LTO-8: it's recent, and I could only find 2 different drives. I restricted myself to a handful of pricing examples because I didn't want to gather data endlessly (there are a lot of people selling LTO tapes), but I didn't want to have to sift through a startling lack of data about whether unusually low/high prices were legitimate offers, or indications something was wrong with the seller/device. Whatever, I just got enough data to average it out[5].

Second, I took the average media cost for an LTO level, and how much uncompressed data that level could store, and figured the cost per TB. It's true that some of the later LTO levels should look a lot more discretized: for example, storing 5 and 10 TB on a LTO-8 tape (which can store 12TB) will cost exactly the same, while you'll need to get around twice as many LTO-3 tapes. However, just making everything linear makes analysis a lot easier, and will give approximately correct answers. If it turns out that tape becomes competitive at some small media storage multiple then we can re-run the numbers.

Then, it's just a matter of solving a couple of linear equations, one representing the tape fixed and variable costs, and the other the hard drive costs. To capture some variability in the hard drive cost, I compared the tapes against both a hypothetical cheap $100/4TB drive and a $140/4TB drive[6].

Cost_{Tape} = TapeMedia/TB \cdot Storage + TapeDrive
Cost_{HD} = HD/TB \cdot Storage

Finding the storage point where the costs become equal to each other:

Storage_{competitive} = \frac{TapeDrive}{HD/TB - TapeMedia/TB}

When we solve with some actual data (Google Sheets), we get the smallest competitive capacity going to LTO-5 (1.5TB/tape). And yet, it doesn't look good: if we're comparing against expensive hard drives, we need to be storing ~100TB to become competitive, and if we're comparing against cheap hard drives, we need ~190TB to break even.

So I did some more sensitivity analysis: right now, drives and media are expensive for the recent LTO-7 and 8 standards. Will our conclusions change when LTO-7/8 equipment drop to current LTO-5 prices? Comparing to expensive drives the minimum competitive capacity drops to ~65TB, but that's assuming no further HD R&D, and is still way above the amount of data I will want to store in the near future[7].

In retrospect, it should have been more obvious than I was thinking that the huge fixed costs of tape drives along with non-minuscule variable costs just doesn't make sense for any data installation that doesn't handle Web Scale™ data.

And that's not even fully considering all the weird hurdles tape has:

  • It's unclear whether there are RAID-like tape appropriate filesystems/data structures, especially when you don't have N drives that you can write to at the same time. You can read stories about wrestling with tape RAID, but it doesn't seem to be a feature of the standard Linear Tape File System.
  • Tied into with the previous point, you'll need to swap tapes once one of them fills up. Or if you're trying to get media redundancy, you'll need to do a media swapping dance every time you want to backup. Needing to manage backup media isn't really great when you're trying to make backups so easy they're fire-and-forget.
  • Tape drives are super expensive, which makes them a giant single point of failure. Having redundant drives means you need even more tons of data to stay competitive with normal hard drives.

So we've arrived at the same conclusion as our gut: tapes are overdetermined to be a bad idea for the common consumer. If you can get really cheap clearance/fire sale drives, it might become worth it, but keep in mind the other concerns listed above.

Data and analysis available on Google Sheets.

[1]  Which initially doesn't sound very impressive, given Backblaze's B2 offers $0.005/GB. However, that's an ongoing monthly cost: two months is enough to put tape back into the game, at least according to the linked Forbes article. (I've also remembered more impressive numbers in other articles, but maybe that's just my memory playing tricks on me.)

[2]  Tape has nice properties beyond just having a lower incremental storage cost. It's offline as opposed to constantly online: once you have access to a hard drive, you can quickly overwrite any part of it. Since it isn't possible to reach tapes that aren't physically in the drive, it becomes much more difficult to destroy all your data (say, in a ransomware attack). Tapes are possibly more stable in terms of shelf life, and you can theoretically write to it faster than hard drives.

[3]  If nothing else, owning as many universe breaking/munchkin approved pieces of technology seems like a good policy.

[4]  Sure, you can use VCRs for storage with ArVid, but it is not competitive at all at 2GB on 2 hour tapes. It could probably be made to work better since it uses only 2 luminance levels instead of a full 256+ gradations, but the graniness of home videos doesn't give me hope for much better resolution. Plus, you can do all that extra work, but you'll only end up with capacity comparable to current Blu-Rays. And, where are you going to find a bunch of VCR tapes these days?

[5]  Taking the median is probably better for outlier rejection, and taking the minimum price in each category would probably be a good sensitivity analysis step. I don't believe either choice drastically changes the output for me, since I have relatively small amounts of data to store, but you might want to run the numbers yourself if you have more than, say, 20TB to store.

[6]  It's true that there will likely be some additional hardware costs to actually access more than 12 hard drives, but if nothing else you could go the storage pod route and get 60 drives to a single computer, so we'll just handwave away the extra costs.

[7]  Honestly, I'm not even breaking 1TB at the moment.