TNS
VOXPOP
404 - VOXPOP NOT FOUND
Give use a minute to figure out what's going on here ...
Cloud Services / Operations / Serverless

How to Conquer Cold Starts for Better Performance

Learn what Microsoft is doing to reduce latency in Azure Functions and what you can do to speed up cold starts in your own apps.
Apr 25th, 2024 9:17am by
Featued image for: How to Conquer Cold Starts for Better Performance
Featured image by Lieve Ransijn on Unsplash.

Latency is the enemy of nimble apps, and cold starts — the extra time it might take for a function that hasn’t been used recently to respond to an event — are just one cause of it. No matter when your functions were last called, you want lightning-fast triggers and no lag time while a function warms up.

Measuring and improving cold starts in the platform and optimizing your functions are important to improving application performance and mitigating latency. Fortunately, there are things you can do to prevent a cold start from chilling your app’s performance. Before I share some tips, I’ll explain how Microsoft has approached the problem in Azure Functions.

How Azure Functions Mitigate Cold Starts

Since launching in 2016, Azure Functions has helped many customers efficiently develop highly scalable, event-driven logic using their programming language of choice. It is built for scale and developer productivity, and the function runtime can run anywhere, even on your own machine. That’s the beauty of serverless functions.

However, one execution per instance isn’t the most efficient model for running functions, so Azure Functions provides concurrency options per instance (if your workload supports it). The benefit is not only better performance but also a drastic reduction of cold starts since new instances won’t start until the concurrency threshold is met.

The first step to improving performance is measuring it. To measure Azure Functions’ performance, we prioritize the cold start of synchronous HTTP triggers in the consumption model. That means looking at what the platform and the Azure Functions host must do to execute the first HTTP trigger function on a new virtual machine (VM) instance. Then we improve it. We are also working on improving cold starts for asynchronous scenarios.

To assess our progress, we run sample HTTP trigger function apps that measure cold start latencies for all supported versions of Azure Functions, in all languages, for both Windows and Linux consumption. We deploy these sample apps in all Azure regions and subregions where Azure Functions runs. Our test function calls these sample apps every few hours to trigger a true cold start, and currently, it generates about 85,000 cold start samples daily.

In tests, we aim for the cold start to be a few hundred milliseconds at the 50th percentile and well below one second for the 99th percentile, across all regions and for all supported languages and platforms. We built a massive infrastructure to conduct these tests.

How We Handle Alerts

In the past 18 months, we’ve reduced cold start latency by approximately 53% across all regions and for all supported languages and platforms.

If any of the tracked metrics start to regress, we’re immediately notified and start investigating. Daily emails, alerts and historical dashboards provide the end-to-end cold start latencies across various percentiles. We also perform specific analyses and trigger alerts if our 50th, 99th or maximum latency numbers regress.

We also collect detailed PerfView profiles of the sample apps deployed in select regions. The breakdown includes full call stacks (user mode and kernel mode) for every millisecond spent during the cold start. The profiles reveal CPU usage and call stacks, context switches, disk reads, HTTP calls, memory hard faults, common language runtime (CLR) just-in-time (JIT) compilers, garbage collectors (GC), type loads and many other details about .NET internals. We report all these details in our logging pipelines and receive alerts if metrics regress. And we’re always looking for ways to make improvements based on these profiles.

The Path to Industrial-Strength Performance

Developers sometimes ask why it takes so long to improve performance. We aim high and optimize for 99th percentile latency — a challenging feat of detection and engineering. We delve into cold start scenarios at the millisecond level and continually fine-tune the algorithms that allocate capacity. Our main focus areas are:

  • Function app pools: In the internal architecture, we must have the right number of function app pools warmed up and ready to handle a cold start for all supported platforms and languages. These pools serve as placeholders, in effect. Exactly how many are needed depends on the usage per region plus enough extra capacity to meet unexpected bursts. We continually refine our algorithms to balance the pools without increasing costs.
  • 99th percentile latencies: Although it’s relatively straightforward to optimize cold start scenarios for the 50th percentile, it’s more important — yet far more difficult — to address 99th percentile latencies, particularly when multiple VMs are involved. Each runs different processes and components and is configured with specific disk, network and memory characteristics.
  • Profilers: Specialized profiling tools dissect cold start scenarios at the millisecond level, examining detailed call stacks and tracking activities at both the application and operating system levels. The PerfView and Event Tracing for Windows (ETW) providers are great at addressing issues with Windows and .NET-based apps, but we also investigate issues across platforms and languages. Sometimes the profiler even generates issues or false positives, and that takes even more time to sleuth.

6 Ways to Improve Cold Starts in Azure Functions

We’ve learned a lot from our trials that has minimized the impact of cold starts on app performance. Here are a few strategies you can try to analyze and further improve cold starts for your apps:

  1. Deploy your function as a .zip (compressed) package. Minimize its size by removing unneeded files and dependencies, such as debug symbols (.pdb files) and unnecessary image files.
  2. For Windows deployment, run your functions from a package file. To do this, use the WEBSITE_RUN_FROM_PACKAGE=1 app setting. If your app uses Azure Storage to store content, deploy Azure Storage in the same region as your Azure Functions app and consider using premium storage for a faster cold start.
  3. When deploying .NET apps, publish with ReadyToRun to avoid additional costs from the JIT compiler.
  4. In the Azure portal, navigate to your function app. Go to Diagnose and solve problems, and review any messages that appear under Risk alerts. Look for issues that may impact cold starts.
  5. If your app uses an Azure Functions Premium or App Service plan, invoke warmup triggers to preload dependencies or add custom logic required to connect to external endpoints. (This option isn’t supported for apps on consumption plans.)
  6. Try the “always ready instances” feature in our newest hosting option for event-driven serverless functions, Flex Consumption, which is in early access preview. This plan supports long function execution times and includes private networking, instance size selection, concurrency control, and fast and large scale-out features on a serverless model.

Still Feeling the Chill of a Cold Start?

If your Azure Functions app still doesn’t perform as well as you’d like:

Group Created with Sketch.
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.