Auto scaling Pinterest
At Pinterest, infrastructure efficiency is one of our top priorities. During peak hours, Requests Per Second (RPS) can be twice that of RPS during off-peak hours. In the past, we maintained a fixed number of instances in the fleet to serve during peak hours and ensure the fleet wouldn’t be under capacity. However, as RPS decreases during off-peak hours, most of our instances run under utilization. Since Pinterest is built on top of AWS, we decided to apply Amazon Auto Scaling to our service.
We first tried Amazon Auto Scaling in 2013, however the fleet didn’t scale up correctly. Instances that need to launch in AWS with a Pinterest-specific image also need to deploy with service-specific code, but we saw new instances failing to install configuration or deploy service code. These failures prevented the instances from handling production traffic and the fleet from scaling up. The fleet would continue to run under capacity but eventually melt down as RPS increased. So, how did we make auto scaling work for us?
Building an auto scaling engine
Last year, we gave auto scaling another try and started building a new auto scaling engine. The following graph shows some major components of the system.
We built standard components into our auto scaling engine, such as UI configure console for auto scaling, scaling activity notification and unique features to make it easy for engineers to adopt.
Auto scaling in tandem. AWS provides spot instances which can be bid with much lower prices. Since spot instances are more cost effective, we implemented a spot auto scaling group to run in tandem with demand instances to reduce the number of on demand instances running during peak hours. The high level structure is shown in following graph. We also detailed several features in our spot auto scaling group design.
- Spot instance ratio. The capacity of auto scaling with spot instance should be smaller than that with the reserved instance to ensure there isn’t a major impact on the site if numerous spot instances are terminated by AWS due to spot price surge. The auto scaling engine makes sure the number of spot instances don’t exceed this ratio.
- Scaling triggers. We use the same metric source for both auto scaling groups with different scale up and scale down thresholds. Both metric thresholds for the spot auto scaling group are lower than the threshold for the on demand auto scaling group. This assures the spot auto scaling scales more aggressively than on demand auto scaling group.
Reliable image build pipeline. To reduce instance launch time, we worked closely with Reliability engineers to improve our image build pipeline so all necessary configurations would be baked into our base image. This greatly improved instance launch successful rate, and effectively reduced the instance launch time.
Health check process. To ensure instances can be launched successfully, we built a health check process that launches new instances each time it starts with the latest image, configuration and service code. In the meantime, the process also runs a list of health check scripts which covers everything from system level network checks to application level service integration tests. Any of the following cases would mark the health check process as failure:
- Any of the system level or application level health check scripts fail
- Service code isn’t deployed successfully
- The instance can’t start to serve traffic in a predefined amount of time
The health check process can be triggered in two ways:
- AMI triggered. Whenever an image is published to the auto scaling engine from the image build pipeline, a health check will start to verify whether the new image is safe to use. If the health check fails, the engine won’t update the group launch configuration and service owners will be notified.
- Time triggered. The process also runs periodically in each Auto Scaling group. When a health process fails, the auto scaling engine immediately disables the scaling down process for that group. The scaling down process resumes once the health check passes. When several consecutive health checks fail, the alert triggers and notifies the service owner.
Graceful shutdown. We integrated the AWS lifecycle hook into our auto scaling engine and deploy service. Service owners can provide service specific STOP script to gracefully stop the service on an instance before AWS terminates it. Instead, AWS pauses the instance termination process and sends a SQS notification to the auto scaling engine which marks the instance state as PENDING_TERMINATED and executes STOP script on the instance. Once the STOP script is done, auto scaling engine resumes the instance termination process, and the instance gets terminated by AWS.
Customized metrics to power auto scaling. In addition to the metrics provided by AWS, we added application metrics monitoring support. Service owners can use customized metrics as a scaling up and down indicator. In order to make developers’ lives easier, we also integrated our internal metrics system. Developers can specify the metrics name on the auto scaling console, and a dedicated worker running periodically in the background pulls metrics data from our own metrics system and sends them to the AWS cloudwatch.
Since enabling auto scaling, we’ve seen promising gains as the average CPU utilization for our fleets are flat.
We’ve also enabled auto scaling for dozens of internal services that involve thousands of instances. As an example, take the above graph which shows free reserved instance counts every day. The solid red line is the reserved instance threshold, above the red line is the available free reserved instance and below the line are the demand instances running. The red dashed line suggests if auto scaling wasn’t enabled, we’d consistently run a certain amount of on demand instances every day. The red area in the graph is the instance hours we save from auto scaling on a daily basis. As you can see, auto scaling saves a significant number of instance hours every day, which leads to several million dollars in savings every year.
Acknowledgements: The primary contributors to the auto scaling engine include Linda Lo, Nick Dechant, Baogang Song and Jinru He from the Cloud Management Platform team, a subteam of the Infrastructure team that drives reliability, speed, efficiency and security for Pinterest and its infrastructure.