Dynamic Scaling: Why Shifting Right Is the Smart Approach

Discover how runtime configurations for autoscaling and workload rightsizing can optimize resource utilization, improve performance, and reduce cloud costs.

Aug 21st, 2024 11:00am by Phil Andrews

Featued image for: Dynamic Scaling: Why Shifting Right Is the Smart Approach

Image via Pixabay.

The software development culture often emphasizes the “shift left” approach, where tasks traditionally performed later in the development lifecycle are integrated early. This includes practices like Infrastructure as Code (IaC), Testing as Code (TaC), and Security as Code (SaC). Tools like Terraform and Pulumi are used to set up infrastructure, which is then version-controlled and stored in source control systems.

However, there are scenarios where “shifting right” is beneficial. Shifting right involves defining specific settings at runtime rather than during the initial stages of development.

This approach is particularly advantageous for dynamic aspects of a system, such as autoscaling conditions. Unlike the cardinal sin of testing in production, setting autoscaling parameters at runtime is practical and often necessary.

Keep reading to explore two compelling cases for shifting right: dynamic scaling and workload rightsizing.

Dynamic Scaling: A Case for Runtime Configuration

Autoscaling conditions vary significantly based on day-to-day usage, influenced by factors like customer load and industry seasonality. For instance, a microservice’s CPU requirements might fluctuate by as much as 1,000 CPUs between peaks and valleys.

Static autoscaling configurations defined in code can lead to inefficiencies. The system either overprovisions or underprovisions resources, leading to increased costs or degraded performance.

In a real-world example, a cluster with over 8,000 pods and 600 unique workloads experienced CPU utilization ranging from 13% during low usage to 26% during peak times. By implementing runtime automation for autoscaling, the cluster could dynamically adjust resources, significantly improving efficiency and reducing costs.

Implementing Runtime Automation

To effectively implement runtime automation, it’s crucial to establish guardrails within which the system can operate autonomously. This approach doesn’t conflict with the principles of IaC but rather complements it by providing flexibility within defined boundaries.

Common mistakes in setting up runtime automation include being more open with instance types and sizes.

For example, restricting to a single instance type (the “node group” mentality) can limit flexibility. AWS alone offers over 800 instance types, many of which share similar CPU and memory characteristics but differ in additional features.

The solution? Dynamic instance sizing based on workload requirements ensures better resource utilization and cost efficiency.

Workload Rightsizing Works Best at Runtime, Too

Workload rightsizing involves adjusting resource requests and limits based on actual usage rather than static configurations. In Kubernetes, this means setting dynamic resource parameters that adapt to the environment’s needs.

Current limitations, such as restarting workloads to apply new settings, are being addressed in upcoming Kubernetes versions (e.g., Kubernetes 1.30).

Tools that monitor and analyze real CPU and memory usage can help generate appropriate resource settings. They enable environments to self-tune, ensuring optimal performance and cost savings. For example, in a development environment, CPU compression is acceptable to reduce costs, while production environments require higher performance guarantees.

Conclusion

Teams adopting runtime automation for node and workload sizing can achieve significant cost savings and performance improvements. By shifting right for certain dynamic settings, they can leverage real-time adjustments to optimize resource utilization and reduce waste.

Overall, the compounded gains from logical and physical resource adjustments lead to a more resilient and cost-effective cloud infrastructure.

How does this type of automation work in practice?

The graph below shows an auto scaler scaling cloud resources up and down in line with real-time demand, giving enough headroom to meet the application’s requirements.

This example comes from Akamai, one of the world’s largest and most trusted cloud delivery platforms. The company used real-time autoscaling to reduce its cloud bill by 40-70%, depending on the workload. Check out the complete case study to learn more.

Phil is CAST AI's Field CTO, and works with customers to educate and encourage Kubernetes best practices that lead to optimal cloud costs. With more than 15 years of experience in a wide range of positions, Phil balances resiliency, performance...