[NOTE: This essay was commissioned by a client in December 2006. It’s the third in a series of old-yet-relevant position-papers whose exclusivity has expired, that I’m editing and posting. Things for the next five look “similar”. There is no formal “conclusion”, as this is one section of a larger piece.]
Over the next five years, gross storage needs will double every other year, sparked by industry trends that avoid deleting anything, ever; continued bloat in software programs; increased user demand for larger-file storage; increased user demand for indefinite storage; increased user, corporate, and industry expectation of system-side backups and frequent snapshots; and the enabling factor of meteoric-disk-size -to- paltry-disk-cost ratios.
Since the late 1990s, we have seen rapid acceleration of infinite data life. While storage vendors will use terms such as “information life-cycle management”, “information archiving” or “data warehousing” – they all converge onto the premise that corporate data life is no longer finite. The value of this is dubious, but irrelevant to argue: financial workers expect to be able to look at historical data for modelling purposes; draft and product workers expect to be able to look at long-dead projects that might now be of value with new knowledge; in the throes of bankruptcy, competent managers (and lawyers) will want to mine the archives for something… anything that may provide some value. Everything your organization has ever known is expected to be retained, indefinitely.
The average 10-page MS Word document in 1995 was 13K in size. The average 10-page MS Word document in 2006 is 1.4MB. While that size may still seem small, it’s indicative of a growing trend of software generating vastly wasteful content because they can. Software vendors don’t need to worry about their data fitting onto floppies anymore, so they don’t. Multiply this across dozens of applications, add in media, and you have truly huge data files with only a few pages of actual content.
Similarly, the users want ever-larger files. Gone are the days of compressing graphics, video and audio to the Nth degree: users want full-quality content. They don’t want a 120×120 “thumbnail” video, they want something that takes some real-estate on their oversized monitor. As bandwidth increases, so will the user-desire for better content faster. They then want to save that same content to their network volume. They want it backed up in case of catastrophe (or their own error). What was a 3MB MP3 file is now a 45MB FLAC or WAV file sitting in your database.
The increase in user-end space (desktop harddisks) has led users to demand not only more and more space from their storage providers, but also indefinite storage. Users no longer have to selectively delete their e-mails to stay in a predefined space, so they keep them all, forever. They expect the same from the rest of their digital attics: they expect every bad poem, doodle, patent-idea-on-a-napkin, picture of their grandkids, etc. to be immediately available, forever.
Forever. Even if your disks die. Even if they accidentally delete them. Even if a meteor pummels your datacenter. The old standard of weekly backups have long passed the borders of Being Prudent, travelled through the Fields of Marginally Acceptable, and have entered the Mountains of Irreparable Harm to Your Reputation. Users, customers, regulators, etc. are barely tolerant of losing a day of data, and this will get worse. In the next half-decade a truly monumental shift into multi-media backups, near-real-time data snapshots, and 100% protection of data assets will be fully realized, requiring several multiples more mixed-media backup storage than live data storage.
On the up-side, disk sizes are sky-rocketing, costs are plummeting and the reliability of the new serial ATA (SATA) architected drives have come up to a level that allows anyone to build in or expand networked disk with a trivial investment. A new generation of storage vendors are coming up and challenging the old way of thinking about networked storage, and adopting technologies with more agility than their behemoth competitors. We’re quickly on our way to 1TB disk drives, flash-based storage continues to be refined and is nearing enterprise-grade, holographic storage is being commercially realized for some applications, and all of these technologies are driving the cost per megabyte down.