Clustering Tier One Storage
Clustering Tier One Storage
Clustered Storage is now appearing at all levels of the storage environment, there are primary Tier 1 storage providers like 3PAR, NAS providers like OnStor and Nearline storage providers Permabit’s Enterprise Archive Solution. Clustered storage can overcome several key obstacles that storage professionals face today; scale, performance, reliability and upgradability.
Traditionally storage platforms have been delivered in two flavors, high-end Monolithic Storage Systems like Hitach Data Systems Lightning and EMC’s Symetrix or more modular dual controller based systems. What is common between these two platforms and Clustered Storage is the need for Storage Compute power. Storage Compute is the muscle that these systems use to perform disk I/O operations, like RAID calculation, LUN Management and even to manage features like snapshots or replication. Although there are other factors, like disk spindle count and speed of those disks, storage computer power is a key contributor to overall system performance.
Monolithic storage is so called because you buy most of the entire system (just not the capacity) upfront. Then as your capacity needs expand you add disks to the shell that you originally purchased, you typically can also add additional bandwidth to these systems in the form of additional storage ports. For the most part though the storage compute power is there from day one and the only way to expand that is to replace the compute components. Monolithic systems are designed for databases and other applications that demand the highest levels of performance and are reserved for high end of the data center. At some point though you can hit a limit to which the system was designed to scale and an upgrade needs to occur.
Modular systems are typically offered with a module that contains just the storage compute power and then disk shelf modules are attached as capacity needs increase. Similar to Monolithic storage the only way to increase storage compute power is to replace the storage compute module. Most suppliers of modular solutions today can perform a data in place upgrade, where storage shelves from the existing systems can be connected to the new primary storage compute module but none can scale their modular systems to their monolithic systems. Modular systems cover a wide range of needs from the small business up to the very large medium business and there is generally some overlap in performance capabilities between the highest end modular system and the lower end monolithic systems
The two solutions have created a storage gap. Monolithic Storage Systems are expensive and their cost puts them out of reach of many large data centers. These Data Centers often end up using modular systems where there is a constant need to upgrade the storage compute power of these systems. To compensate the decision is often made to purchase multiple modular storage arrays, that all need to be managed separately. To make matters worse some IT Departments can justify a Monolithic system for some of their environment but not all and they end up with a mix of systems. The systems use different OS’s, another gap, that make for two entirely different management experiences.
Into these gaps steps clustered storage systems, that leverage a grid architecture to deliver a cost effective high performance solution. A clustered storage system typically has multiple interconnected nodes that form a single storage compute engine. To these nodes, capacity is attached in the form of disk shelves. A tier 1 clustered storage system addresses several key issues that many data centers face; scale, performance, reliability and upgradability..
Scaling and Performance
The storage compute responsibility in a storage cluster is spread across the nodes in the cluster. If greater storage compute performance is needed more nodes are added to the cluster. In Tier 1 Systems, Storage Compute Nodes can be added to the cluster independent of capacity. When a node is added to the environment it is interconnected to the other nodes in the clusters the performance load is transparently redistributed across the nodes in the cluster. Capacity is added to Tier 1 Clustered Storage Systems via disk modules similar to the manner in which shelves are added to modular systems. The difference being is that all the storage compute nodes not by a single storage controller manage capacity. Teir 2 and Nearline Systems tend to scale capacity and performance at the same time, where each node also has storage. In Teir 2 in nearline where capacity tends to be the primary driver for additional performance this is a logical way to design the Grid Architecture.
Additional compute power is often needed to support additional capacity, but capacity is not the sole driver for additional storage compute needs. For example as more users are added to a database application, the I/O requirements of the application can increase substantially while the actual data footprint may be relatively small. Similarly in today virtual server environments as more and more virtual machines are added to a physical host capacity needs may not scale at the same pace the the disk I/O needs do. The ability to separate the scaling of capacity and storage are critical in Tier 1 Storage.
Reliability
Both modular and monolithic systems have excellent reliability and eliminate single points of failure. Clustered Storage Systems can leverage their grid architecture to go beyond reducing single points of failure and provide multiple layers of redundancy. In Tier 1 storage near 100% system uptime is a customer expectation, and rightfully so. Maintaining near normal production performance is also becoming a key requirement. While it does not typically mean loss to data, controller failure, especially in a traditional dual storage controller modular system, means a loss in performance, by as much as 50%.
Clustered Storage Systems can be configured based on the customer need. If a temporarily loss of 50% of storage I/O performance is acceptable then a two node system may be acceptable. If not then the system can be scaled out further, for example a six node system one on loss 1/6 of it disk I/O capability on a node failure. Unique to a Clustered Storage System, two of the six nodes cold fail and data access would still be maintained. As mentioned above of course when all the nodes are active all involved in delivering performance, there is no need for a standby node.
The additional nodes also help in another area of reliability that requires a substantial amount of compute power; recovery from a drive failure in an array. RAID rebuilds can require substantial storage compute resources as a result many times the IT professional has to decide between sacrifice application performance to complete the RAID rebuild quickly or slow the RAID rebuild process down so that acceptable application performance can be maintained. In a Clustered Storage System, the RAID rebuild compute requirement is spread across all available nodes, the rebuild can be completed quickly while still maintaining near normal levels of application performance.
Upgradability
When additional performance capabilities are needed traditional modular systems must be upgraded by replacing the storage compute engine, essentially upgrading the controllers. The good news here is that most manufactures of modular systems can now perform data in place upgrades so that as long as you stay within the same product family you do not have go through a painful migration path. In many cases their is downtime plus you have the expense of the older controller, while some manufactures offer a trade in value for the controller it is not going to be for the amount that it was originally purchased.
Monolithic Systems will typically accept additional I/O ports and some will allow for additional compute power. So upgrades can be done within the unit. It is possible that the I/O capability of the Monolithic System itself can be exceeded and a similar upgrade to Modular Systems would have to occur. As stated before the primary issue with monolithic systems is that the base systems are expensive as are the upgrades to that system, but until the advent of Cluster Storage Systems there was no alternative to consider.
As discussed in the performance section Clustered Storage Systems are upgrade by adding nodes to the cluster. There is no downtime in adding a node, it is merely plugged into the grid, and consolidated into the cluster. It also seems possible that overtime as faster compute nodes are developed that there may be the opportunity to intermix different nodes of different performance capabilities within the same grid.
A Tier 1 Clustered Storage System fills the gap between high-end modular systems and monolithic systems. At about the same price point of higher end modular systems, Clustered Storage Systems offer the enhanced performance; reliability and upgradability of Monolithic Systems and in some case can exceed it.
Friday, October 24, 2008