
Luke Skywalker: [interrupting] Will you shut up and listen to me! Shut down all the garbage mashers on the detention level, will ya? Do you copy? Shut down all the garbage mashers on the detention level! Shut down all the garbage mashers on the detention level!
C-3PO: [to R2-D2] No! Shut them all down, hurry!
The Official AWS Blog (great resource, by the way!) has a post describing how to reduce costs in EC2 by automatically shutting down instances while not in use.
The short version of the blog post is that you can achieve this sort of shut down in two ways:
- Create a CloudWatch alarm to shut down when CPU usage is below a threshold for a period of time.
- Create an EventBridge trigger and Lambda to shut down on a schedule.
I would argue that in most deployments you would have a more precise metric that actually reflects the number of your HTTP requests.
There are other guides on how to do this; I’ve looked at some, as I’ve been planning to do this for our Fargate instances (not EC2, obviously, but similar enough) in our test environments. However, it’s nice to have an official source on how to do this kind of shutdown.
The reason we want to do this on my project is to save on cloud costs. The savings probably aren’t that much, but they come from an area of the budget that is limited and needed for other things.
At any rate, option #2 better reflects what my team will want to do. We have very spiky usage, but when we do go to test an instance, we don’t want to have to wait for it to spin up. Since we have similar work hours, we’ll probably want to shut down the instances except for around 06:30 – 20:00 on weekdays. That way, it’s up whenever people are likely to be working and down at other times.
One difficulty I have anticipated is, what if someone tries to test at an odd hour. I don’t mind terribly if they need to manually start an instance; it should happen very infrequently. However, I’d like them to see a clear error message describing what’s going on. We won’t use this system in production for obvious reasons, but it would be nice if future devs years from now don’t get confused because they can’t reach the service.
So, I’m wondering if there’s a good way to dynamically point our ALB to a static file on an S3 bucket or something while the instances are down. It might be possible to set a static file as an error page in the ALB? Not sure yet. Clearly I have not yet given this more than cursory research, but it’s on my mind.