Hypothetical qquestion; if you’re storage array is 50% allocated, is if half empty or half full? The Pessimist would say it’s half empty. The Optimist would say it’s half full. But most of the storage folks I talk to would say, “who cares?”
In their minds the key challenge is not usually running out of storage capacity. With space efficiency technologies like thin provisioning, reclaim, compression and dedupe, snapshots, tiering, etc… there are lots of proven ways to optimize capacity utilization.
The issue keeping many storage folks awake at night is not knowing if you have the capacity AND performance to meet an app’s needs. Sure, there’s plenty of capacity available to add 2TB for the new database. But… Are there enough array resources (compute, memory, bandwidth, IOPS, etc) to deliver the needed performance? How will adding this new app impact the others already running on the array? How do I know if the apps are running in compliance with their desired performance service levels? And how do I know when I’m running out and need to add more resources?
These are some pretty meaty technical issues. Without the right storage skills, performance tools, and understanding of different app workload types, it’s not easy to know if or when they might be a problem. The notification is usually when the phone rings and there’s a concerned user on the other side wanting to know why their apps are running slow. I’ve heard storage folks describe this situation many times, over many years. They all have the same, simple request. Fix it.
And to really fix it, it takes a new approach to how the system is designed, packaged, configured, and managed. It’s a big part of the innovation that has gone into the VMAX3 architecture, as well as a key strategy focus across all EMC storage platforms. To fix it means turning the traditional provisioning process upside down. Today, we often build the storage bucket and start pouring in apps until it overflows. Knowing how many more apps you can pour in, or when you are about to hit that overflow point is, to use a technical term, hard.
Provisioning by service level (or in EMC lingo “SLO Provisioning”) lets the app guys request a certain level of service, say “Gold” to get a 5ms average response time. The storage system then automatically sizes and determines if the storage infrastructure has enough resources (ports, engines, cache, disks, etc..) available to deliver it.
By running this admissibility check, the system then knows if you can safely commit the provisioning request. Not only will it tell you can deliver the 5ms response time, but can also do it without impacting the other apps already running on the array. They key point is the request is validated before provisioning the storage. It’s sort of like making sure you have enough funds in your bank account so you know if you have enough to cover the check you are writing.
If it can’t deliver Gold, the system can tell you what’s actually available in the system and what performance you can deliver. So while you may not have enough left to deliver Gold at 5ms, you may have enough to deliver Silver at 10ms. And if you really do need more Gold, the system can advise you what to add to get Gold (ie, like more front end, backend or disk resources). Again, like your back account, if you only have $50, and you need to write a check for $100, the system tells you need to deposit another $50.
And just like overdraft protection, the system won’t stop you from asking for Gold and provisioning it, even if you don’t have enough. It will tell you that you shouldn’t. And if you do it anyway, the system will try and loan the resources from other apps if they are available. But the system can’t guarantee that Gold can be delivered for this app.
Again, the part that’s changed, is that now the storage folks know when they are running out. There’s no magic bullet to prevent the bucket from getting full, or making sure all your storage “checks” will always have enough in your account be covered. But at least now the storage folks know before it happens, and when if it does, how to fix it.
So is SLO provisioning a game changer? I guess it depends on your perspective. Here’s a good analogy (apologies to the millennials for going old school). Remember your gas gauge in your first car? It told you when you had half a tank of gas. While you knew when it hit “E” you were out of gas, you didn’t really know how far you could go before you needed to gas up.
And if you remember those days, you probably have seen a car or two on the side of the road with its gas cap opened, and the driver a few miles up the road looking for a place to get some emergency gas. These were often the folks that knew they were close to “E” but decided to see how far they could push the needle to the left.
So why does that not happen very often today? I’d argue most drivers don’t really care if their tank if half empty or half full, or are interested in seeing how far below “E” they can go. It’s because the today’s cars don’t just tell you how much gas you have in the tank, but more importantly, how many miles you can go until you run out. I have no idea how full my gas tank is. What I do know is I can get back and forth to work for 3 more days till I need to add more.
So if a car can calculate how many miles you can go before running out of gas, wouldn’t it be cool, and even useful, if a modern storage system could do the same? That’s why SLO based provisioning so important, and dare I say “Game Changing,” to storage folks.
Whether you’re a half empty or half full person, you now have better visibility into what’s really available before you start to push the limits. Because as the storage folks know, whether it’s full, half full or empty, your users don’t care, they just expect to consume it.
For the folks who really want to see how it actually works, my good friend @dunfee16 has a cool, dare I say face melting, demo from his #EMCGeekPit. You can check it out here: http://t.co/kGcAYoNf9L