Compression Perfection: Optimizing Performance and Efficiency

One of the coolest and most anticipated features of the new VMAX All Flash software release is compression. At a high level, it allows the array to minimize the amount of space required to store an app based on the compressibility of the data. The general rule of thumb is about a 2:1 compression for typical transactional apps and databases that rely on VMAX for high end, mission critical storage.

The ability to apply compression when storing data is not new and has been around for decades, going as far back as T1 networks and tape drives. What’s cool about VMAX compression is the science and innovation is used to not just consume less physical storage, but to maintain an optimized balance between performance and efficiency.  Here’s how.

What is it?

Here’s the “skinny” on compression. It’s a standard VMAX All Flash feature and included with the system (i.e. it’s “free”). Compression can be enabled or disabled at the storage group level and can be turned on or off at any time.  More importantly, it is completely transparent to the host, apps, and array data services, meaning nothing in the stack needs to be changed other than checking the “compress box” for your storage groups.

VMAX applies a track-level inline compression via a combination of a hardware accelerator and sophisticated software. It compresses data as it is written to the backend storage and decompresses as it’s read by the host. The combination of the hardware accelerator and the software maximizes space efficiency while also maintaining consistent, high performance levels. Pretty Cool.

data-compression-comicWhy is it Cool?

All compression technologies are not created equal. Depending on the implementation, there are potential performance impacts associated when compressing data in either an inline or post process implementation This can create a performance overhead on the array even if the data is highly compressible. There’s also the process of inflating the compressed data when an app needs to read it, also creating a performance overhead.  How does VMAX address these potential performance impacts? The simple answer is “Cache”, and here’s why.

VMAX uses sophisticated intelligent algorithms to optimize the amount of active data that resides in cache to provide consistent sub-millisecond response times. This significantly reduces the impact of the overhead to compress/decompress data because most IO’s are serviced directly out of cache.  VMAX caches all writes and generally services them within 200-300 microseconds (or .2 to .3 milliseconds). Read hits from cache are also generally serviced in the 200-300 microsecond range. The result is there is zero performance impact of compression when data is written to or read from cache.  In VMAX, this can typically account for 50-70+% of the total IO.

So what’s the impact of a read miss where the data is read from the backend flash and uncompressed? The typical overhead for uncompressing read miss data is in the range of 100-200 microseconds (or .1 to .2 milliseconds). For most apps, this overhead is undetectable, especially if moving from an array based on a mix of mechanical drives. But even if the overhead is extremely minimal, VMAX does some really clever things to even further minimize this small impact via its Adaptive Compression Engine.

The VMAX “ACE” in the Hole (aka the Adaptive Compression Engine)

ace-in-the-holeThe Adaptive Compression Engine is comprised of several internal features designed to optimize space efficiency and performance. First is a Hardware Accelerator enabled via an onboard card included with every VMAX V-Brick.  Next is the intelligent Activity Based Compression algorithm which identifies active data address ranges to bypass compression if array resources and space are available. There is also the ability to perform Fine Grain Data Packing to chunk up data extents written to the array to provide increased read and write granularity and parallelization. Finally, there’s Optimized Data Placement to “bucketize” data across the most efficient, right size compression pools based on the compressibility of the data.  Let’s look at each of these in more detail.

The Science of the VMAX Adaptive Compression Engine

Hardware Accelimage-1-hardware-accelerationerator is based on the onboard compression card to offload compression and decompression tasks from CPU cores on the directors. The technology is not new to VMAX, since similar hardware accelerators have been used for several years for running SRDF. In the SRDF case compression was implemented to minimize the amount of data sent across the network to the remote array. In this case it is to minimize the amount of data when writing to Flash. The result is zero performance impact when compressing data being written to Flash.

image-2-activity-based-compressionActivity Based Compression identifies and maintains up to the busiest 20% address ranges as uncompressed as long as the space and resources are available. The algorithms used to identify these active addresses are based on some of the intelligence originally used with FAST. As data is written to flash, if the data is likely to be reread, the ABC (Easy as 123) will bypass the compression card. This eliminates the performance overhead of uncompressing during a read request for active data that is not located in cache. The key point is that it is performed only against the most active data if the space and resources are available.  It’s a pretty innovative way to intelligently optimized and balance between performance and space efficiency.

image-3-fine-grain-data-packingFine Grain Data Packing allows tracks to be compressed in finer grain reads and writes. It breaks up a 128K address range being written to the storage into 4 32K segments. The increased granularity helps with writes as well as reads due to smaller data chunks having to be moved and uncompressed from the backend flash. And it provides more performance and throughput by parallelizing compression and decompression across 4 lanes in the hardware accelerator to spread out the compression operation. The result is higher efficiency and lower latency for read performance, especially for small block IO’s.

image-4-optimized-data-placementOptimized Data Placement leverages multiple back-end compression pools to ensure the best compression ratio is achieved to minimize required flash space. All data has the ability to be compressed, however all data is not necessarily equal in how much it can be compressed. Some data sets may compress to one resulting size and other data may compress to other sizes. In order to maximize compression efficiency, multiple compressible ratios need to be available.

The compression pools are dynamically adjusted based on compressibility of data sets being written, providing super high levels of backend storage efficiency. In addition, extent pools are spread across all back end resources to optimize overall system performance.

And In Conclusion…

The adaptive compression engine within the VMAX All Flash systems offers a multitude of benefits. The use of inline compression and the algorithms paired with compression hardware acceleration has pushed the VMAX All Flash to new levels of efficiency and performance. It has IT organizations loving their new VMAX All Flash Arrays as they enable compression with a simple click and then allowing their VMAX All Flash to do work, delivering compression perfection.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s