Should We KISS Workload Management Goodbye?

I am always surprised at how IT goes around continually revisiting problems from the past.  For example, in the 1990s, the constant refrain seemed to be around homogeneity — the search for the universal system that could run any workload that was thrown at it.  Organizations eventually figured that there was a need for specialist systems, so mainframes, UNIX, and Linux platforms continued to be run alongside Windows which everyone had pinned so many hopes upon.  To make all these systems work together, enterprise application integration (EAI) evolved, morphing to enterprise service busses (ESBs) and web services which were used to try to ensure that the disparate bits could still interoperate with each other.

This worked out to be complex and expensive, and the effort then seemed to be back to a scale-out approach of commodity Intel-based servers running either Windows or Linux to give a relatively homogeneous platform.

Now, greater abstraction through converged systems using advanced virtualization of the physical layer means that heterogeneity is back on the menu.  IBM now offers the PureFlex mixed Intel/Power platform alongside its z/Enterprise mainframe/Power/Intel systems, and others such as HP, Dell, and Cisco build a “scale through” platform predominantly on Intel, but increasingly including server systems based on graphics processing units (GPU), specialty processors such as the Java engines from Azul Systems, and microservers based on ARM and Intel Atom.  The physical layer is getting more “interesting”, but it is also going to cause issues for users.

The big issue is in managing all the different workloads across the various platforms.  It’s all well and good having different hardware architectures capable of running specific workloads in the most optimal manner, but if the workload can’t get to the right chunk of hardware at the right time, the whole effort is pointless.

The other, not so big issue — but one that is waiting to jump out and bite the unwary — is the use of the cloud.  It may well be that the best technical platform for a specific workload is out there in a remote cloud, but how do you get the workload out there in a timely manner?  Latency rears its ugly head, and getting a large workload image out to the right platform may be problematic if not managed effectively.

So, there is a need for a new generation of workload manager: systems that identify exactly what resources are required and which technical architectures are best suited to manage the overall requirement, and to then orchestrate the provisioning of the workload to the right place, in a timely manner.  This requires a lot of intelligence in the system, and although things are moving in the right direction, I’m not sure if we are quite there yet.

Everything has to be done in real time.  The aim is to ensure that the right workload is serviced by the right resources to create the most responsive system possible.  If it takes seconds for a decision to be made as to where the workload should go — or even worse, if a sysadmin has to make a manual decision — the benefits will be lost.  There are so many variables involved that the software to fully automate such orchestration may get it wrong as often as it gets it right.

For now, it may be best to use a paraphrased KISS principle: Keep It a Simple System.  Where there are definable different workloads with different technical architectural needs, create a hard coded means of allocating them to the right platform.  Where cloud is being used, decide if a workload should be in the cloud or not before attempting to provision it. Trying to be too clever could see you KISS your dreams of an effective IT platform goodbye.

Image: TaniaSaiz (flickr) – CC-BY-2.0