When I push changes to
master, the CI build is fine, but when it comes to hot deploying the container, I’ve started to receive:
service sample-app-frontend-acme-stage was unable to place a task because no container instance met all of its requirements. The closest matching container-instance xxxxxxxx-xxxxxxx-xxxx-xxxxxx has insufficient CPU units available. For more information, see the Troubleshooting section.
I’ve not altered the task definitions, so I’m wondering if something has not been cleaned up properly…
I could recreate the cluster - which I’m sure would solve the issue, but I do feel I should know how to resolve the issue properly.
Appreciate this is pretty close to a generic AWS question, but I’m trying to understand the things in the terraform scripts that could effect CPU usage in ECS.
Any help appreciated.
With ECS, your ECS Cluster has a total amount of memory and CPU available that can be allocated to run your ECS Tasks. For example, if you have a cluster of 3
t2.medium has 2 CPU cores and 4 GB of RAM, so your total available resources for the cluster are 6 CPU cores and 12 GB of RAM.
Whenever you run an ECS Service, ECS will figure out how many ECS Tasks you intend to run. For example, maybe you run 2 ECS Tasks for your service and up it to 3 as your service load increases. For each ECS Task, ECS will attempt to run the ECS Task on a node in the cluster that has enough resources to run the service. For example, you might have created an ECS Task Definition whose
cpu properties are set to 1 core and 1 GB of RAM respectively (docs).
If ECS can find a node in the ECS Cluster that has at last 1 GB of RAM and 1 CPU core available, it will run the task. If it can’t, it will give you the error message you saw and even tell you who the best candidate node was.
To resolve the issue, you need visibility into the ECS Cluster to understand your Cluster Utilization. ECS exposes this information as CloudWatch Metrics, and describes in the docs which metrics are available to you. You can use the AWS Web Console to view these metrics by going to your ECS Cluster and looking for the “Metrics” tab.
Clearly, your cluster doesn’t have enough available resources to run the desired ECS Task, so your next step will probably be to look at each ECS Node and review the currently running ECS Tasks. Ultimately, your options are to either:
- Terminate enough ECS Tasks to free up the resources you need
- Add additional resources to the cluster by launching new ECS nodes (or increasing the instance type) in use for the cluster.
Thanks @josh-padnick for the detailed help - it really helped me understand how to configure the services and tasks.
With my new found understanding, I think that the
infrastructure-live-acme may have an issue then with its default values. My understanding is still rudimentary so I’ll ‘show my working’
The latest revision has the
ecs-cluster on a
t2.micro instance (so 1 CPU core / 1024 CPU units).
sample-app-frontend-acme task def has
CPU: 512, so won’t that mean that you can only have 2 running tasks at any one time?
Given that hot deploy is being done by
deployment_maximum_percent = 200% by default, shouldn’t the task def have
CPU: 256 to allow for 4 running tasks during an update?
The latest revision has the ecs-cluster on a t2.micro instance (so 1 CPU core / 1024 CPU units).
By default, we launch a cluster of 3 nodes. Are you saying your cluster has only a single node?
The sample-app-frontend-acme task def has CPU: 512, so won’t that mean that you can only have 2 running tasks at any one time?
Yes, that’s correct.
Given that hot deploy is being done by deployment_maximum_percent = 200% by default, shouldn’t the task def have CPU: 256 to allow for 4 running tasks during an update?
One of the goals of the ECS Scheduler (or any Docker cluster scheduler) is to place a new Task on the “best” node. The ECS Scheduler takes into account which Availability Zones other ECS Tasks for this ECS Service are running on, available resources on a given node, and other factors.
So on a 3-node cluster where each node has 1024 CPU units, with an ECS Task that has 512 CPU units, as you suggest, we can run up to 2 ECS Tasks per node, or a total of 6 ECS Tasks. If the ECS Service is running 3 ECS Tasks, as part of a rolling deployment, you can now run up 200% of that, or 6 ECS Tasks, so the cluster should give you what you need.