Want to learn how to increase Data Scientist productivity by 80%?
Check out our new blog post

Find out here
Back to blog auto-scaling-in-kubernetes

Multi-Layer Auto-Scaling in Kubernetes

I’m often asked how to set up auto scaling in Kubernetes. But unlike computing, auto scaling brings more dimensions into the equation

auto-scaling-in-kubernetes
twitter facebook linkedin

Several years ago, while leading the development at an IDF (Israeli Defence Force) technological unit, I found myself facing a major challenge, how to scale a massive data pipeline. We had various sources, including streaming and complex data in parallel, with dozens of pipelines to maintain. The mission is to alert for potential human or hostile cybernetic activity within minutes. On such occasions, the last thing you want is an infrastructure that can’t scale or support the volume. At that time, we used 2 approaches to scale:

  1. Build the pipeline architecture to reach maximum capabilities during peak time. The problem with this approach was that it was highly inefficient, speculative, and not cost-effective. 
  2. Let DevOps assign more capacity to the pipeline when needed (long-term approach). This may be more effective in terms of resource management. But in reality, it is mostly triggered only when a lack of resources or pipeline failure occurred. 

Both approaches were static, and couldn’t adjust dynamically in a timely manner.

The 2 dimensions of auto scaling in Kubernetes

Over the last decade, scaling technology has gone through a massive evolution. Nowadays, it’s possible to achieve auto-scaling in Kubernetes. In other words, it allows you to automatically adjust the scale of your clusters, and add additional pods when needed.

Many people wonder how to set up auto scaling in Kubernetes effectively. But in fact, unlike scaling with computing, scaling with data has two dimensions:

  1. Compute  
  2. Data flux

Not all pipelines are the same – some pipelines are heavy consumers of CPU/memory, while others require more network/IO operations resources. That’s why better data flux isn’t always achieved by adding more computing resources to the data pipeline. The inefficiency might actually be a result of a shortage of network/IO resources. 

The Datorios approach offers multi-layer smart scaling that’s based on 3 core principles.

The 3 Datorios principles 

To illustrate, let’s assume that each data pipeline is populated in one Kubernetes POD. Within the Datorios framework, each data transformer essentially functions as a worker inside that pod. An orchestration layer supplies ongoing monitoring of latency and data backlog for each transformer (worker). These will dynamically adjust at 2 levels:

1. Horizontal auto scaling

Each POD implements a pipeline, to the point it exhausts its reserved resources (CPU, RAM). Then, the platform automatically scales up, creating another POD, and assigning it to the relevant pipeline.

auto-scaling-in-kubernetes

2. Vertical auto scaling

Datorios dynamically adjusts the pod based on real-time pipeline logic. For example, a pipeline that’s currently blocking a network isn’t using all computing resources. In such a case, our framework automatically adjusts and concurrently processes quoted events. This will better utilize the POD’s resources, without negatively impacting a single event’s processing latency. 

3. A smart orchestration layer

In addition to the two levels of scaling, a truly transformative, modern data framework also needs a management layer. Our management layer automatically orchestrates pipeline scaling based on a defined set of rules, facilitating smart auto-scaling.


Not all pipelines are born equal. In addition to serving different business needs, some pipelines get higher priority than others. It’s important to make sure you prioritize your resources accordingly. For example, pipelines that support traffic management during critical “rush hours” vs. a pipeline that supports a random BI tool. This brings us to the concept of Selective Scaling when resources are being allocated to pipelines dynamically. Only when needed, and for a predefined duration. 

The bottom line is that the linear growth of adding additional pods struggles to efficiently address scaling. When you think about auto scaling, make sure to consider the 2 scaling dimensions; horizontal and vertical. Or in other words, Implementing a smart orchestration layer that scales based on actual business and operational needs.

Related Articles

Start Working with the New Generation of Data Pipelines Today

Battle-tested and proven for real-world data challenges

Get Started
Let’s get started!

Fill out the short form below so we can tailor our offer to your needs.