What if you, as a data center manager, suddenly discovered you had double the data center capacity you thought you had?
Grossly underestimating capacity is actually much more common than you might think. One client recently showed me their data center, which they claimed was "completely full," but they were using only half of the 2.5MW circuit they were paying for each month. Their question was why.
They explained their planning assumptions, which were very rational: 10kW racks, spec'd never to exceed 8kW. They had bought the latest in big-name blade servers, which, according to the manufacturer's web site, were rated for a maximum power consumption of 7200W, even though each one took up just one-third of the rack. So while the rack was only one-third full, based on expected power consumption, it appeared to be completely full.
Enter a reality check. We took a duplicate server and measured actual power consumption, from boot-up to shut down, from zero-load to full-load, to get a complete picture of the server's power consumption under real-world conditions. The most power it ever used was 3500W, and that was under full load with all memory used.
We then put in their actual use curve and found that they hit the 3500W scenario less than 1% of the time. Under "normal" use (about 99% of the time), the blade server never consumed more than 2500W.
The picture becomes much clearer now. The company could easily install two of these blade servers per rack and never hit their 8kW max. And, if they needed additional capacity, they could install three blade servers per rack and power cap them so that even under their 1% scenario-which was now two-thirds less likely to take place given the amount of capacity-they wouldn't blow a breaker.
We then modeled the entire data center for them. They could easily double the number of blade servers installed, which would mean not only having twice the compute power, but also coming much nearer to filling the 2.5MW circuit they were paying for. And if needed they could triple their capacity and protect themselves by using the power capping capability built into their new servers.
So why is this type of over provisioning so common? The problem starts in the design phase, when typical methods for estimating needed resources virtually guarantee "hidden" capacity. Data center designers and IT managers use a reasonable process to estimate average load and peak load for an application and then add room for growth. The designers use the only sources of information on server load they have: the information the manufacturers provide. From testing hundreds of servers, we've found that the reported power consumption is almost always overstated (although a few types of servers actually understate power consumption, which is even more problematic).
While it's easy in hindsight to see that over provisioning is a problem, it's harder to justify changing this process in the absence of real information. If you don't have information about your servers' power usage characteristics, you can't risk tripping the breakers. So, what's a better approach?
Ironically, the information is all there: it just has to be gathered and analyzed.
To optimize your data center capacity, you'll need to answer certain questions. The answers to those questions will help you use resources more efficiently to avoid systemic overcapacity.
There are three steps to optimizing your data center:
-
Measure to find out how much capacity you're really using
-
-
Analyze to figure out possible ways forward
-
-
Optimize to make the most of your data center
What Data Do You Need to Optimize Your Data Center?
In order to determine how much power you really need, you'll need answers to these questions:
-
How much power is your data center really using? The more granular the information about power usage, the better. How much power is being used on each rack and by each server? If you can then correlate that with your utilization on each server, you can start to understand where (and how much) power is being wasted and what you can do to change it.
-
-
How efficient are the servers? Are the most important applications running on newer servers or on older servers that use more power and are less efficient? You might be surprised how quickly a new server can pay for itself. With Moore's Law showing no signs of slowing down and power costs on the rise, it may make sense to upgrade more often than you think.
-
-
How much extra load can the data center handle? Determining the true power capacity of servers as I outlined earlier gives you the real story.
-
-
What is the total compute capacity and how is that capacity being used? More importantly, how close are you to running out of capacity? If you run out, latency increases and customers get the dreaded "server unavailable" message. You don't want to be so efficient that you don't have enough capacity to meet customer needs.
Data Center Analytics
The above is just one small example of how data center analytics enables you to move from guessing to knowing what you need and how to optimize it. And if you can save money in one area, you can pour it back into your company in another more strategic area that can improve the bottom line.
Data center analytics takes a holistic view of your data center. To understand how efficient your data center truly is, you need to understand not just your facilities efficiency (your PUE), but also to look at each of the following key elements and their relative capacity and efficiency: cooling, power, CPU, memory, network, storage, and space.