Life used to be so good. An operating system was loaded onto a server of some sort and an application stack loaded on top of that. Provided the workload remained pretty static, the performance was pretty much guaranteed.
The problem was that most workloads weren’t static. As such, resource ceilings were often hit, and the performance of the application suffered. The move to cloud computing was touted as the great solution to this – application workloads could be provided with enough server, network, and storage resources dynamically to avoid any performance degradation.
The trouble is – it doesn’t seem to work very well.
Cloud is proving to be more complex than many people (and vendors and service providers) envisaged. Most organizations will find themselves with a hybrid mix of systems – some clusters running single workloads, some virtualized environments running a few workloads, and a mix of private and public clouds sharing multiple workloads. Ensuring that everything runs effectively is a big challenge. A systems architect trying to work all of this out is in for a tough time.
Consider a private cloud. This will have a fixed number of workloads that will be running on it. Architecting a cloud platform that can deal with all workloads peaking at the same time will result in massive over-provision for average usage. Architecting for average workload utilization is likely to result in constant resource bottlenecks that will impact the performance of all applications. Using the public cloud to provide bursts of resources may be possible – but which workloads will still work well with cloud storage working over the WAN, or with a hybrid CPU platform spread between two different datacenters?
Automation seems to be the key to managing all of this, but where will such automation come from? Application performance monitoring (APM) will be a necessary part of the mix in order to identify the root cause of any problems. A deep understanding of the relationships between server, storage and network resources will also be required. Some sort of rules base that reflects the organization’s risk position on where it believes workloads should run – for whatever reason, even if the reason is more a gut feel rather than anything based on fact – will also be required.
Any tools must be able to respond in real time to avoid performance issues becoming noticeable to users. As cloud computing develops, it will also need to be able to identify the best place for a workload to reside – not just whether it should be in the private or public cloud, but whether it should be running on an x86 platform, or an ARM, Nvidia GPU, Azul Systems, or not another “engine”. Knowledge of how optimization tools operate needs to be built in – for example, WAN acceleration, data deduplication, and fabric networks.
The tool should be able to advise at a business level when it makes technical sense for a complete application stack to be moved from one type of cloud to another, or even from a “less-cloud” cluster or virtual environment into the cloud – and should be able to carry out this move through the use of containers and virtual resource pointers. As part of that move, the tool must be able to identify whether every part of the application needs to be moved, or whether it can be run as a composite app with functions being called from different parts of a cloud to meet the business’ needs.
This all needs a lot of measurement and monitoring across the hybrid platform. This is why it is important for cloud standards to be of a high quality and of sufficient capability. It is also why intelligent workload management tools need to be open and not just aimed a single hardware or cloud system.
Finally, the main reason why intelligent workload management will be a necessity is the rapid impact of the Internet of things (IoT). Without good workload management, IoT devices will swamp the network with their data: intelligent workload managers need to have the capability to identify what is noise level data and filter it out before it gets too far along the network.
Is it asking too much? Is it possibly a step backwards to the horrible days of systems management frameworks? Is the level of intelligence required beyond the capabilities of today’s technology? Hopefully, the answer is “no” to all three questions. Tools are already available to monitor the performance of applications – even ones that are cloud-based. Root-cause analysis and remediation tools are also available. Data analytics and machine-to-machine (M2M) data tools are mature enough to deal with the big data issues around workload management.
It is possible – and companies such as CA, BMC, IBM and others are working towards such systems, if a little slowly. A full-function system is still some way off – but steps are being made in the right direction.
When looking for a cloud management system, make sure that you ask the vendor what their views are on intelligent workload management – and if they don’t have much of an answer, look elsewhere.