Opinion Piece: Horizontal and Vertical Scaling in the Cloud

The concept of Auto Scaling in the Public or Private Cloud is fraught with confusion and complexity.

Cloud clients are constantly battling with not only the differences but also when to use which.

Some definitions:
Horizontal Auto Scaling is the concept of adding additional cloned servers to a pre-existing server farm, enabling those servers via a Load Balancer of some sort and dealing with the additional load. Horizontal Load Balancing should always be implemented using a set of rules to define upper and lower limits as well as the performance thresholds needed to be breached and for how long before orchestrating the creation or destruction of virtual servers. The concept itself is great, however, should never be implemented without control. As an organisation one should always understand one’s business IT consumption in order to predict and define the rules with a level of certainty.

Implementing Horizontal Scaling would require the following minimum configuration settings:

  • Minimum # Servers
  • Maximum # Servers
  • Name of Template to provision from
  • Name of Load Balancer Server Farm to inject or remove servers
  • Maximum CPU / RAM threshold as well as the minimum, for example if my server breaches 70% CPU for 5 minutes then invoke new server, but if a server drops below 20% CPU then destroy it

Vertical Auto Scaling is the concept of increasing the CPU / RAM or Disk capacity on a single server in order to cater for sudden load. Now for many organisations this is a great tool for not only Right Timing resources in a Cloud Environment (as in the ability to have resources running when they only need to be), but also a means to Right Size the provisioning of servers. Many enterprise organisations do not fully understand the sizing requirements of their applications or workloads, and they take advantage of Vertical Auto Scaling in order to achieve the correct sizing of their Cloud Servers.

On further thought it may be of some value to specify a downtime window and track the performance over a day to see if additional or less resource is required – then simply action the change in the downtime window. This would make sense because servers typically reboot during configuration changes and this could result in downtime if actioned too often in a 24 hour window.

But the concept was not specifically designed for this purpose, however, will work perfectly to achieve the desired effect.

Implementation of Vertical Auto Scaling is very much like that of Horizontal, however, the minimum and maximum guidelines would refer to the minimum and maximum CPU and RAM needed for the single server.

Specific knowledge of load balancers and templates would not be required as there is no need for orchestrating changes involving either of these.

Performance thresholds would be in place to determine when to invoke additional capacity and when to remove it.

It is also not recommended to implement both together unless very tight control of when to do which is in place.