5 THINGS: on Archive
1. What is Archive?
Ahh yes, Archive, the thing we rarely think about.
Archive is a lot like that high school yearbook you’ve got tucked way. You only look at it when someone from your past crawls out of the woodwork, and many memories of that time have faded.
The good part is that when you need it, it’s THERE. And that’s exactly what you want from an archive. Archiving, for the media realm, is based on reliability. It’s the bank safe and final resting place of your media, so nothing bad can ever, ever happen to it. It’s not, I repeat, it is not on your SAN. Or your NAS. Or that locally attached storage. All of those are volatile and have fairly limited shelf lives. Studies have shown that after 4 years, 20% of hard drives out in the wild are doorstop fodder, so we need to come up with an alternative solution. Enter Archive.
2. Where does Archive fit in?
When we look at various storage theories that involve archive, we come across the concept of tiered storage. Tiered storage speaks to the relative importance of media combined with how fast you can access it.
While the lines in the sand of storage waver depending on your industry, we can draw some basic guidelines for folks working with video based media. Let’s discuss the various tiers of storage and how archive relates to the rest of your storage.
Tier I is your fast storage. This is your storage that must deliver the most throughput and speed. This is where availability is paramount over capacity. For VFX people this may be flash storage, for video people this is usually your DAS – Direct Attached Storage, or, on your SAN. Some companies will try to push NAS solutions for Tier I video solutions, which I believe to be a way to get cheaper solutions to unsuspecting folks, but that’s my tech cross to bear.
Next, we move to TIER II. Tier II is where you need capacity over availability. This is normally a holding pen for rarely used media, or after projects are done… but you have a decent chance of using the projects and media in the immediate future.
Often, film and TV projects use Tier II for the high res masters, while they edit with low res proxies off of TIER I; they then relink to the high res on TIER II for output. This is where NAS solutions usually fit in.
I’m also seeing facilities beginning to use LTO for TIER II, especially in reality television. Capacity is paramount in these scenarios with severely reduced daily usage.
Archive is commonly referred to as TIER III in most scenarios. In enterprise-based deployments, archive can be TIER IV or even TIER V. In either case, it’s the last stop for your media. This is where reliability is valued over availability or capacity.
Often, Archive takes the form of LTO, cloud storage, or optical storage. These formats need to have massive amounts of redundancy, and be less sensitive to the effects of father time. As you will most likely never access most of the content again, you need it somewhere where you don’t have to worry about it.
3. What Archive options are there?
As mentioned, Archive usually takes the form of LTO, Cloud, or Optical storage.
Cloud is a fancy-schmancy way of saying “someone else deals with it”. This usually means a 3rd party with a data center with many machines and a while lotta storage. So many of both, in fact, that if any machine or storage pool goes down, they have redundant systems to safeguard against data loss. Often, they call this the five 9’s, that is, 99.999% uptime.
This could be as simple as FTPing your media to a 3rd party webhost, or as involved as a managed archival platform running on Amazon S3 or another managed cloud provider.
Normally, the cost for storing the data is cheap, but the retrieval of that same data is the expensive part. Also, the paltry bandwidth you may have to your facility will greatly reduce the availability of your media. In many cases, that’s not a huge deal – after all, this is archival, where immediate availability is the lowest priority…but it is something to provision for if you’re looking at consolidating the tiered storage methodology.
Next we have LTO tape. If you’re unfamiliar, check out our last episode on LTO, tons of great info. In a nutshell, each LTO-7 tape can hold to 6TB – and over 2x that if the data is compressed. With read times from the tape approaching 300MB/s, and a shelf life of 15-30 years, LTO is a fantastic option. Single LTO-7 readers are under $5000, and autoloaders, multislot libraries, and robots are great workhorses if you’re serious about having an archive strategy.
Lastly, we have Optical. This is a way of using older blu-ray disc technology to store data. It’s kinda like that old CD changer you had in your car. It’s a magazine of BluRay disks that data can be housed on.
Sony, one of leaders in optical technology, have ODA – Optical Disc Archive. Gen 1 of ODA is shipping with 1.6TB cartridges, and read speeds approaching 140MB/s, with write speeds about half that.
Gen 2 of ODA, which is due to be released later this year is approaching double of every Gen 1 spec…at about the same price.
The bonus is that Optical solutions are random access –which means you can instantly start pulling the media back once the cartridge is loaded. This is different than LTO tape, where the drive needs to fast forward and rewind to find the data on the tape.
Lastly, as Sony claims, ODA discs last upwards of 50 years, and can actually function after being submerged in saltwater for several weeks.
4. How do I plan for Archive?
You may think Archive is a set it and forget it scenario. Copy it off, and move on. But oh no, there are things you need to do to ensure than just because you’ve got it, you can actually find it…if you ever need to.
Time isn’t kind to our memories. Out of sight, out of mind. Thus, having a way to catalog the media so you can find you put it is paramount. This can take the form of an archive format agnostic asset management, or, software specific to the medium you plan to backup to. Meaning, if you use Sony ODA for optical, you use the software that comes with ODA.
I’m a big fan of a agnostic asset management system, not only can it track and reconcile data on any of your Tiers of storage, but it can also automate the writing and retrieval of your archival material. Also, most asset management systems have pretty industry standard databases, so you still have the record of the archival data, even years down the road if any of your tech partners goes out of business.
Next, if you’re not using the cloud, where is your media physically being kept? On a shelf next to your TIER II, or housed offsite? Here in California, we never know when the big one will hit, so many folks with ship LTO backups out off state. Flooding and fires also don’t play well with storage. This is where safes become an option, or, again, contemplate moving your stuff offsite.
Now this can get expensive. This is where you may want to consider charging clients for archival. Offset your cost by making it a service. You can offer backing up not just the final cut and elements, but perhaps all of the originals as well.
5. How much does Archive cost?
Buying your own archive storage is priced differently than a cloud archive model, which is more a rental model or per month cost. So, I’ve averaged prices out over 4 years. After 4 years, 80% of hard drives out in the wild still function, so it’s a good ballpark number to compare from.
What this doesn’t take into consideration are the costs for the infrastructure to use these storage mediums. LTO drives can be under $5000, while the tapes are under $200. Optical systems are still newer, and start around $6500.
And while the initial outlay of thousands of dollars may seem steep, consider the cost of data retrieval from a cloud archive on top of the reoccurring monthly cost of a rental model, which will only rise as you archive more and more data.
Have more archive concerns other than just these 5 questions? Ask me in the Comments section. Also, please subscribe and share this tech goodness with the rest of your techy friends. Archive is a great party starter.
Until the next episode: learn more, do more – thanks for watching.