Can Cloud Storage be the Solution to Data Explosion?
Can Cloud Storage be the Solution to Data Explosion?
Can Cloud Storage be the Solution to Data Explosion?
Joseph Ortiz, Senior Analyst, Storage Switzerland, LLC
The data explosion; is there a way to keep pace with the ever increasing storage demands but without breaking the budget and the company? Can Cloud Storage be the solution to this age old problem?
Everyone in the technology sector these days is well aware of the pain experienced by IT staffs around the world in these days of economic downturn. IT budgets are frozen or slashed as are IT staffing levels in many companies today. But the amount of data they are required to manage and protect continues to grow at an increasing pace with some estimates running at 50% or more data growth per year. Various factors, to name a few, such as regulatory compliance (i.e. SOX, etc.), possible litigation and e-Discovery requirements, DR mandates, collaboration requirements for decentralized offices and users saving all types of data, “just in case”, are helping drive this phenomenal growth in data quantities.
All of this new data along with older existing data requires more and more storage space, which increases the demands placed on IT staffs to provide that storage space as well as the means to manage and protect the data on it. This in turn increases operating costs as companies face the need to purchase still more servers and disk, which naturally increases their requirements for more floor and rack space, more power and more cooling. All of this drives up acquisition and operating costs straining budgets already stretched to the breaking point.
Other areas impacted by this continuing data explosion are the costs associated with protecting all this data properly. As the data sets grow ever larger, the backup and recovery windows also keep increasing until they reach a point where they can’t effectively do the job properly in the time available. This makes it more difficult to meet DR objectives and time frames as well. To meet these backup and recovery needs, companies are faced with buying still more disk to hold all this data.
So is there a way then to increase storage and accessibility for all this data without incurring major capital expenses and yet still providing proper protection and control of that data? To answer that question we need to look at a couple of things. First, what technological developments are available to address these ever increasing storage needs cost effectively? Second, what types of data would provide cost savings by being stored on these new technological developments.
First let us look at what technologies could address these increasing storage needs economically. Some companies are now looking at “cloud computing” for a possible solution to this dilemma. This is not a new technology but one which has actually been around for some time now, and in the past has had names like storage as a service, storage utility or software as a service. In addition to the modernization of the term, this has been helped by increased access and bandwidth to the Internet as well as developing some new capabilities and services.
So just what is “cloud computing” and what can it provide to address the storage problems faced by so many companies these days? The definition of “cloud computing” is in a state of flux these days as various people and groups are refining the definition to reflect the new types of remote aggregator services now available on the Internet. In truth there may never be an agreed upon use of the term.
For many that have been in the computer industry for a long time (since the early 80s), we still remember “the cloud” as referring to the Internet itself. Essentially, “the cloud” was any computing or data resources and services that were outside of your own intranet or LAN. You connected to them over the Internet on a one by one, as needed basis. Old database services such as Compuserve, Delphi, Prodigy and a few others are examples of some of these “cloud” resources available back then.
But now we have moved beyond “the cloud” into “cloud computing” which in its simplest form or definition is a style of computing that makes available dynamically scalable and often virtualized, services and resources, to users via the Internet. The users do not necessarily need to have any in-depth knowledge of or expertise in the technology infrastructure made available in “the cloud” in order to us it.
In the last few years it has become necessary to broaden our definition of “cloud computing” as new services and computing resources became available from various integrators and aggregators. A couple of the more familiar services are infrastructure as a service (IaaS) and software as a service (SaaS) along with Web 2.0 and other recent technology trends, all of which share the common theme of users relying on the Internet to satisfy their computing needs. These real time services and resources are most frequently a subscription based or pay-per-use service that extend IT’s existing capabilities. Some examples of this software as a service providers are Salesforce.com, Google Apps and QuickBooks, to name a few. Then there are other companies that offer on-line backup of data as a service such as Carbonite’s Online Backup, EMC’s Mozy, Seagate’s Evault and Iron Mountain Digital’s LiveVault.
More recently we have seen an addition to the definition for SaaS, which is storage as a service. This new development is catching the attention of budget constrained IT as a possible means to increase storage capacity and add other capabilities on the fly without the need to invest in new hardware, infrastructure, new software or training new personnel. So now in addition to “cloud computing” we also have “cloud storage” as the latest service which is designed to address the ever increasing storage needs of IT in the most cost effective manner possible.
So what are some of the features that would need to be available in a “storage cloud” offering to make it a viable alternative for IT? It would need to:
•have a geographically aware infrastructure not tied to a specific geographic location
•present a single name space to users and administrators via the Internet
•provide highly and easily scalable, readily accessible storage on the fly
•be based on commodity components
•be application agnostic
• provide secure connections for all data transfers
•provide data dispersal to multiple locations
•deliver transparent file access to the geographically closest access point
•provide world wide collaboration on documents, etc., with local user experience
•be billed on a usage basis
•integrate easily and seamlessly with your infrastructure without need for custom API
•provide automatic migration of selected and inactive data to provider’s data centers
•provide a means to move massive amounts of data to and from the user rapidly
•provide readily accessible and qualified technical support
•provide high security levels physically and electronically for data you store there
Ideally you would also want a provider that has been around for a good while and that has a lot of experience in data storage and archiving on a massive scale.
As this is a fairly new area of “cloud computing”, there are only a few major providers in this area now. Among those are Amazon.com’s S3 (Simple Storage Service), Nirvanix, Rackspace Hosting Inc.’s Mosso Cloud Division, Vaultscape Inc., and Iron Mountain Inc.’s new VFS (Virtual File Store) service. Each of these companies offerings will have different strengths and weakness that will need to be evaluated going forward. Iron Mountain is the oldest of these companies and has been in the business of storing and archiving information for over 55 years.
Now that we have an idea of what “cloud storage” is and the various companies offering this service, we next need to determine what types of data we are dealing with and which would be good candidates for archiving on “cloud storage” services.
All data can be divided into two broad categories:
1.Active data – all data being used, modified and accessed frequently by users as well as data that is not being modified but is accessed frequently for reference or research purposes.
2.Inactive data – all data this is no longer being modified and is accessed infrequently or not at all
In general, only 20% to 30% of data on most networks is active data. The other 70% to 80% of data is inactive. The problem with this inactive data is that it costs quite a bit to let it sit idle on expensive, high performance Tier 1 storage. Simply moving this inactive data, as it increases in quantity, to cheaper Tier 2 or Tier 3 storage in your data center, will not eliminate other costs associated with storing and protecting this additional data. Some of those costs are:
•More disk and associated enclosures
•More servers to host the additional disk
•More rack space to hold the additional disk enclosures and servers
•More floor space to accommodate additional racks
•More cabling and switches added to your infrastructure to support the new hardware
•More electrical power to run the drives, servers and associated hardware
•More cooling capacity to protect the additional hardware
•More backup software licenses
•More personnel to manage all of this
•Training for new personnel that will manage this
Still, it may not be possible or even desirable to simply move all this inactive data from the Tier 1 disk to tape and store it off site in order to try and cut costs. But if you could move this inactive data to cheaper, long term, highly scalable storage that was readily accessible to users, you could possibly realize a significant cost savings by avoiding additional capital expenditures to expand your current storage and infrastructure. You would be able to purchase your storage on an as-needed basis as an on going monthly expense instead. The focus of cloud storage is on this particular area and it has the potential to provide significant savings while providing your company with the ability to expand your storage on the fly to meet your demands while significantly minimizing your costs.
Cloud Archive
What would a cloud archive solution look like? Among other things, it should appear as a just another server or mount point on you LAN. It should allow you to copy and move data to it, at LAN speed, as easily as you do to any other server or storage on your network. It should be simple to implement and integrate easily and simply with your infrastructure. It should provide highly scalable, near infinite storage quickly and transparently on demand. It should support any type of data you can put on CIFS or NFS. It should be able to work with your existing backup, HSM and archive applications. And it should provide some means of moving massive amounts of data between your site and the provider’s data center quickly. It would provide secure links for the transfer of data over the Internet. It would provide secure physical and electronic storage of all data sent to the provider’s data center.
Cloud archive solutions like Iron Mountain’s Virtual File Store, are a good example of this type of service. Iron Mountain’s VFS provides customers with a virtually infinite storage area to which they can easily move old data. VFS looks and acts just like any other file server on the user’s local network and data can be moved to it as easily as any other server or storage on you LAN. Additionally, archive applications can also leverage VFS if they can use network mount points.
From an architectural viewpoint, Iron Mountain’s VFS solution uses a semi-cloud deployment model, which consists of, mixing on premise equipment and off premise equipment. They place a NAS appliance on premises in your data center, which receives data locally from the LAN via CIFS or NFS. It then replicates this saved data over secure VPN connections to the Iron Mountain data center where the copy is placed on their secure storage. Among the various features provided in the VFS service are:
•Virtually unlimited storage on demand
•Onsite cache
•High security data centers with redundant power and cooling
•Multiple sites
•Continuous data integrity and health checks
•Secure and fast movement of large amounts of data via a physical data shuttle
•Ability to implement retention policies
•Data migration and copies occur at LAN speed
•Transparent file access to end users and applications
•Integration with Windows Active Directory
•Simple configuration and remote management
•Data is secured in-flight and at rest
This semi-cloud approach is a deployment model we think will become more and more prevalent over time as it provides some excellent advantages such as:
•No large capital outlay for new equipment, infrastructure, personnel and
•It “feels local” because of the on premise appliance.
•Scalable capacity on demand without a corresponding investment
•Low, predictable, manageable costs
•Avoids costs associated with equipment obsolescence
•Avoids additional power, cooling costs, space and storage costs
•Continuous data replication of saved data to secondary sites
•Easily deployed as a stand-alone solution or integrated with existing solutions
•Supports any client OS or application that can access data via CIFS or NFS
•Supports all data types that can reside on CIFS/NFS servers
•Can be used as target for off-the-shelf backup programs
•Can be incorporated into existing HSM and Archiving solution implementations
•Files are transmitted over secure VPN network to the remote data center
•Helps meet DR requirements
•Helps meet BC requirements
These various advantages and features should help to reduce your TCO thus improving your ROI in this type of technology.
Monday, March 23, 2009