NetApp FlashPools – Good enough?
Nimble’s CASL IS revolutionary. Not because of the way they use high capacity SSD drives, but more for the way they use ‘slow’ 7.2K spinning drives.Why is that important? Well because your traditional storage vendor can only achieve about 70 IOPS per 7.2K HDD for sub 20ms latency. Try to push more IOPS out of these drives and latency goes up…considerably.Nimble can achieve about 10,000 IOPS per 7.2K HDD. This is not a typo. TEN THOUSAND per drive! Don’t believe me? Try the CS700 which comes with 12 x 7.2K HDDs for yourself (contact us for information).
So how do they achieve this SSD level performance from slow HDDs when others cannot? It all comes down to a combination of the underlying CASL filesystem, inline efficiency techniques like compression, coalescing and sequentialisation, all driven by todays powerful multicore processor technology.
They have managed to abstract IOPS from disk spindles and derive performance from CPU, hence when the next generation of processors are released, IOPS per disk will only go up.
So what’s all this got to do with the title of this post?
Well Nimble’s ability to drive so much performance without relying on lots of flash, whilst providing lots of capacity with so few components is presenting a real challenge for competing solutions. So they are trying to come up with ways to deliver high performance without a significant increase in disks; by adding SSD drives into their configurations.
Some will simply use the SSDs as a high performance pool which delivers high performance for the volumes that reside within it, but volumes that reside in other non-SSD based pools will not benefit from the performance.
Others have developed some kind of hybrid pool that mixes SSDs with HDDs and using either scheduled post process activities to move hot blocks of data to the relevant disk tiers or doing it inline.
This brings me onto NetApp FlashPools which is a hybrid pool (or aggregate as they call them) of SSD and HDDs used to accelerate performance of data in volumes that reside within it in real time.
Please Note: I am not having a dig at NetApp. They have been very clear about how their technology works and documented it extremely well, with this information being readily available to the public as I will demonstrate throughout this post. My concern is with some of their resellers in the UK who are not representing them or their technology accurately, often misleading customers about how Flash Pools work, whether intentionally to satisfy performance requirements stipulated by the customer or because they simply haven’t understood properly how they work.
FlashPools are being sold as the be all and end all for satisfying both read and write performance requirements on a NetApp (I know this from a discussion with a potential customer).
I will reference NetApp’s own documentation that explains how Flash Pools work and I think you will see that it is not as straight forward as you might have originally been led to believe.
I will reference NetApp’s own documentation that explains how Flash Pools work and I think you will see that it is not as straight forward as you might have originally been led to believe.
Flash Pool Design and Implementation Guide, March 2014
1. Flashpools are aggregates made of up SSD and HDD disks and subject to raid and hot spare rules. FlashPools are typically sold in SSD + SATA configurations for the best performance/capacity combination, although it is possible to configure it with SSD + 10K SAS disks (for better performance?)
2. Flashcache (PAM) and Flash pools are not the same. Flashcache accelerates only random read IO but it can do so for all volumes on the system, regardless of which aggregate it lives in. Flash pools only benefit the volumes that live within it.
3. Flashpools only accelerate a specific type of write activity.
During the sales cycle, customers are often led to believe that SSDs in Flash pools will provide cache that will help accelerate both reads AND writes, however after reading the document you will see that it isn’t as clear cut as that.
Flash pools are aggregates made up of SSDs and HDDs to provide you with capacity and increased performance. The SSDs in the Flash pool will accelerate random read activity and only accelerate random overwrite activity of blocks less than 16KB. NetApp are using SSDs to accelerate writes, but only a small subset of write activity.
On page 5 they state that Flash Pools will NOT do the following:
- Accelerate write performance: They state their system is already optimised for write performance, however we know that not to be sufficient, and certainly not to a level where they can deliver up to 125,000 IOPS from 12 x 7200 RPM drives!
- Reduce or alleviate high CPU or Memory Utilisation: In fact, they state using SSDs will consume more resource.
- Cache Sequential (read or write) or large-block (greater than 16KB) random write operations: So it only caches random overwrites and only those less than 16KB.
“Flash Pool is specifically targeted at accelerating repeat random read operations and offloading small-block random overwrite operations (which are a specific class of writes) from HDDs.”
So what exactly is an overwrite? To summarise the lengthy write up:
As a random write comes into the system, it is written to HDD (not the SSD in the flash pool). If a subsequent request for an overwrite of that data is requested, only then is it inserted into the SSD cache in the Flash Pool (even then it is subject to the data age and other rules). A workload that has a high mix of recurring random writes would benefit from this type of caching (that uses data that has a short lifetime and is likely to be overwritten e.g. a pagefile or a temporary database). This works great for OLTP databases, but it’s difficult to see exactly which of your data would benefit from flash pools and to what extent.
Page 9 goes into detail about cache policies. In ONTAP, the administrator is required to configure the cache policies for each of your flash pools and I suspect there will be some fine tuning to be performed over a period of time. In Nimble, the cache is already tuned, no user intervention is required.
Page 10 talks about configuration and administration, but worth understanding how Flash pools are composed and raid protection rules.
Page 11: “The SSD cache capacity is a pool of capacity that is available to any volume in the aggregate”. So only volumes in the flash pool will benefit from the SSDs; volumes in other aggregates will not be accelerated.
Also, review the rules around minimum and maximum number of SSD drives, number of flash pools, max size of flash pools; these vary from model to model and ONTAP versions.
I could continue interrogating this document paragraph by paragraph to highlight the limitations, but the bottom line is Flash Pools are not the answer to the biggest issue faced by customers today and that is how to handle a high amount of random write IOs within acceptable latency figures to deliver good end user performance. In their own words, NetApp have said they do not do this which begs the question, is their proposed solution capable of delivering high write IOPS with or without the SSDs?
This is not FUD; I have specifically referenced NetApp passages for credibility and validation, we encourage fair and responsible representation of information in our community to help users make educated technology decisions to enhance their businesses.