Cloud Native Development in Practice

Scaling

VMware Tanzu Labs

You will demonstrate how to scale your pal-tracker application running on Tanzu Application Service.

Learning outcomes

After completing the lab, you will be able to:

Demonstrate the ability to use autoscaling for an application on Tanzu Application Service.

Getting started

Review the Scaling slides.

Codebase

You must have completed (or fast-forwarded to) the Availability lab. You must have your pal-tracker application associated with the scaling-availability-solution codebase deployed and running on Tanzu Application Service.
In a terminal window, make sure you start in the ~/workspace/pal-tracker directory.

Monitoring

In this lab you will exercise your pal-tracker application under load, monitor it, and tune it.

You can monitor the pal-tracker application through the following:

Command line via the following cf commands:
- cf app pal-tracker
- cf events pal-tracker
Apps Manager user interface.

If you choose to monitor via the command line you will need a minimum of four terminal windows open.

If you choose to monitor with Apps Manager you will need only one.

Scaling `pal-tracker`

Tanzu Application Service supports scaling the number of application instances in 3 ways:

Command line through the cf scale -i <number of instances> command.
Setting in the manifest instances parameter, and pushing it.
Through the Tanzu Application Service Autoscaler.

You have already used the second option to achieve better availability characteristics of your application.

Scaling up the number of pal-tracker instances requires two characteristics of the ` application:

Each pal-tracker application instance cannot persist state within its container.
The pal-tracker application supports a concurrent scale out model, meaning that it can run multiple application instances concurrently.

Now you will use the autoscaler to accommodate increased workload for the pal-tracker application.

Scenario

Pretend that you have been running your pal-tracker application in production for a while, and you have good insights into the runtime characteristics:

You know from experience you can run 10 requests-per-second (rps) comfortably on a given pal-tracker application instance.
You have stress tested your application, and you know the maximum work rate per instance when it may become unstable is 20 rps.
The current pal-tracker application has a relatively consistent workload of 10 rps throughout the day.
You forecast in the next release you will have occasional daily peaks where the pal-tracker application may have to handle between 40 and 50 rps. How many instances will you need to run at peak periods, without factoring in availability?
```
(target rps) / (rps/instance) = number of instances
```
Or
```
(50 rps) / (10 rps/instance) = 5 instances
```
Factoring in the need for availability, you have learned in production that under normal conditions you have sufficient redundancy with 2 extra instances. So, you never want to run fewer than 3 instances.

You now know based on your stress testing that planned and/or unplanned outage of individual instances will be sufficiently tolerated with a total of 5 instances at 50 rps. You can see this by considering that even if 2 of the 5 instances become unavailable, the overall throughput would be:
```
(maximum rps/instance) * (number of instances) = max throughput
```
or
```
(20 rps) * (3 instances) = 60 rps
```
This still greater than the maximum required throughput of 50 rps.

Enable application autoscaling

Tanzu Application Service supports automatic horizontal scaling based on either pre-defined or custom rules.

For request/response (blocking) web applications, HTTP throughput is a good choice assuming you have a solid grasp of the performance, stability, and scaling characteristics of your app.

If you are running the labs on your own development machine, you will need to install the Tanzu Application Service Autoscaler CLI plugin.
You have been supplied with a set up script that will configure an autoscaling rule for you with the following characteristics:
- Minimum number of instances: 3
- Maximum number of instances: 5
- Threshold when to scale up: 10 rps
- Threshold when to scale down: 5 rps
You can review the set up script to see how the autoscaler CLI works:
```
git show scaling-availability-start:scripts/setup-auto-scaling.sh
```
Run the setup to enable the autoscaler:
```
./scripts/setup-auto-scaling.sh
```

Run the following autoscaler watch command:

watch cf autoscaling-events pal-tracker

Observe autoscaling

From a separate terminal window, run a load test:

Note that the NUMBER_USERS is now set for 100 users, and REQUEST_PER_SECOND is set for 50 rps.

It is critical to run with these new settings instead of the previous load test runs, otherwise the autoscaler will not work in line with the expectations of this lab.
```
docker run -i -t --rm -e DURATION=300 -e NUM_USERS=100 -e REQUESTS_PER_SECOND=50 -e URL=http://pal-tracker-${UNIQUE_IDENTIFIER}.${DOMAIN} pivotaleducation/loadtest
```
Observe both the pal-tracker watch and autoscaler watch terminal windows.

How long does it take before the autoscaler scales up to the 5 instance limit?
Let your load test complete, or terminate it by Ctrl+C.

Turn off autoscaling

Observe both the pal-tracker watch and autoscaler watch terminal windows.

How long does it take before the autoscaler scales down to the minimum 3 instances?
Turn off the autoscaler:
```
./scripts/turn-off-autoscaling.sh
```
Terminate the cf app and cf events watch windows.

Autoscaling limitations

You saw that the autoscaling behavior is not instantaneous. It is designed conservatively using a concept of a Governor, an algorithm that limits rate of change within the autoscaler to prevent potential outages if it is not tuned or used correctly.

The Scenario assumed that you have significant knowledge about the performance, stability, scaling and capacity usage of your application. The scenario in this lab is actually quite naive, it is up to you to gain familiarity with your production application characteristics using your observability tools, and also to do empirical testing to verify behaviors that you anticipate to encounter in production.

If you do not have this background and knowledge, do not use the autoscaler!.

See the Reddit outage postmortem announcement related to autoscaling.

It is a sobering read that should give you pause when choosing to run an autoscaler, as well as operate it.

Another good read on the subject is Release It! Second Edition, Chapter 4 - Stability Antipatterns → Force Multiplier and Chapter 5 - Stability Patterns → Governor

A specific limitation for the Tanzu Application Service autoscaler is that the CPU rules are not reliable. See this advisory for more information.

Wrap up

Now that you have completed the lab, you should be able to:

Demonstrate the ability to use autoscaling for an application on Tanzu Application Service.

Report an issue