Amazon EC2 Spot Instances are one type of purchasing the EC2 instances, the other two types being on-demand and reserved instances. Spot instances are the cheapest among the three types and they are cost effective for running fault-tolerant workloads. Before starting to use the Spot instances, it’s important to understand that Spot instances will be interrupted and that different instance types have different Frequency of Interruption. I highly recommend choosing the Spot instance types with the lowest Frequency of Interruption if your workload is time-sensitive. Continue reading to find out how to determine the most and least interrupted instance types.
What are Spot Instances?
Amazon sells out the unused EC2 capacity as Spot instances in the form of bidding. The highest bidder gets to use the Spot instances until Amazon takes them back and sells them as Spot instances to a higher bidder or as on-demand instances if the demand is more. Amazon may terminate these Spot instances without any warning thereby causing interruptions. For these reasons, Spot Instances are cheaper and less reliable than on-demand and reserve instances.
|Cost||Spot < On-demand < Reserve|
|Reliability||Spot < On-demand < Reserve|
Interruptions and Fault Tolerance
The workloads using Spot instances should be fault-tolerant and be designed to recover when there are interruptions. If your workload uses Spot instances and some of the nodes get terminated, the work-load running on other active nodes shouldn’t be impacted. Also, the jobs running that were the terminated nodes should be automatically handled by the other nodes.
Waiting for Spot instances
Since AWS sells the unused EC2 capacity as Spot instances, there may be times when there are no available spot instances for a particular instance type. During these busy times, we’d have to wait for an undefined amount of time until our request for Spot instances is fulfilled by AWS. Despite getting fulfilled, it’s highly probable that AWS may terminate these instances to fulfill other demands.
Frequency of Interruption
The availability of a number of Spot instances varies based on the instance type. Amazon makes different numbers of instances available for each type so some instance types are rarer than other instance types. Add to that, each instance type has a varying degree of demand based on its cost, os, configuration, and region. For these reasons, some instance types are more likely to get interrupted than others. AWS refers to this as the Frequency of Interruption of an Instance Type. The Spot instances with lower Frequency of Interruption are less likely to be interrupted and more likely to run for longer.
Spot Instance Advisor
AWS offers a tool called Spot Instance Advisor to view the frequency of Interruption of each instance type and savings compared to the on-demand rates. The tool is simple yet effective. We just have to select the region and operating system to view the Configuration, Savings Over On-Demand, and most importantly Frequency of Interruption for each instance type. The tools also offer the ability to filter based on minimum vCPU and Memory. The results can be sorted by Frequency of Interruption in ascending or descending order.
Spot Instance Advisor – https://aws.amazon.com/ec2/spot/instance-advisor/
Most Interrupted Instance Types
We can sort the results by Frequency of Interruption in descending order to find the most interruption Instance Types. An instance type’s Frequency of Interruption may be different in each region and for each OS. At the time of this writing, some of the most interrupted Instance Types running Linux/Unix in US East (N. Virginia) are x1e.8xlarge, r5a.12xlarge and r5.large. I would recommend against using these Instance Types as Spot instances if the workload is time-sensitive.
Least Interrupted Instance Types
We can sort the results by Frequency of Interruption in ascending order to find the most interruption Instance Types. At the time of writing, some of the least interrupted instance types running Linux/Unix in US East (N. Virginia) include m5.large, t3.micro, and m1.medium.
Whenever possible, it’s better to choose the instance Types with a lower Frequency of Interruption to run time-sensitive workloads.
I discovered the Spot Instance Advisor Tool after a personal experience. I run big-data workloads in EMR clusters. I observed that I was facing more interruptions whenever I was using a specific instance type as spot instances. I was losing so many nodes that fault-tolerant was ineffective. I then discovered the Spot Instance Advisor and realized that there were more interruptions because of using an instance type with a Frequency of Interruption greater than 20%. I switched to a different instance type and it became more reliable. Now, I make sure to check the Spot Instance Advisor before deciding on the instance type for Spot Instances and I recommend others to do so.
Amazon Spot Instances can be terminated by AWS at any time, thereby causing interruptions. Different instance types have different interruption rates. The Spot instances with lower Frequency of Interruption are more likely to run for longer. You can use Spot Instance Advisor to determine the Frequency of Interruption for each instance type.