Article: How to save 50-90% on compute with spot instances

By: Danielle Royston

Recently I participated in TelecomTV’s Telcos and Public Cloud Summit where an audience poll asked about the main barriers between telcos and the public cloud:

TelecomTV

Source: TelecomTV

As usual, the risk of vendor lock-in was regarded as the biggest barrier with 52% of the vote (eyeroll); but coming in a strong second was ‘concerns over public cloud pricing and associated costs,’ which received votes from 46% of the audience. Ray LeMaistre writes, “It is becoming an increasing worry, and there have long been suggestions that using the public cloud is not always as cost effective and flexible as it might seem.” 

Oh, brother! I can already hear the telco execs mumbling from here. Before you go running back to your on-premise data center, hold the phone. You’ll spend more money in the public cloud only if you’re doing it really, really wrong. So, instead of running back to the headache of managing on-premise stacks, how about you learn how to become a cost-optimization ninja instead? 

Optimize your biggest cost: cloud compute

To start, let’s look at what is likely the largest category on your cloud bill: your spend on cloud computing resources. To determine the price of compute resources, Amazon Web Services (AWS), Google Cloud, and Microsoft Azure use these three pricing models:

  1. On-demand instances: Flexible-but-expensive, pay-as-you-go pricing with no long-term commitment;
  2. Reserved instances: Discounted rates for committing to specific instance types for a longer period, good for steady-state workloads; and
  3. Spot instances: Available-by-bid, this is unused capacity at fluctuating prices that prioritize cost savings. Most suitable for flexible or fault-tolerant workloads.

AWS claims that customers can save as much as 90% by using AWS Spot Instances over On-Demand Instances, and 50-70% over its Reserved Instances. Google Cloud says its preemptible VM instances typically cost 70-80% less than on-demand rates, and 50% less than three-year reserved VMs. Telcos should totally optimize their compute usage alongside these pricing strategies, and when they do, they will save a ton of money.

What’s the difference between reserved and spot instances?

Reserve and spot instances are different in three big ways:

  • Reservation vs. bidding. Reserve instances guarantee the availability of a specific instance type, region, price, and term in exchange for an advance commitment. Spot instances work on a bid model for unused instances. The price fluctuates based on supply and demand—no commitments from you or your hyperscaler. If you bid higher than the current price, you get access to the instances. When the spot price surges above your bid, your instances may be interrupted and terminated with only a few minutes warning.
  • Cheaper vs. a lot cheaper. While reserved instances offer significant cost savings over on-demand instances, spot instances are massively cheaper than reserved instances. Because they are unused resources at the hyperscaler, they’re priced to sell. These instances are usually 50%-70% lower than reserved instance pricing.
  • Steady vs. variable. Conventional wisdom is that reserved instances are best for applications that run continuously or have consistent usage patterns, whereas spot instances are suited for workloads that can handle interruptions or have flexible start and end times, like fault-tolerant batch processing or time-intensive workloads.

Save money with reserved instances

All three hyperscalers offer the idea of a reserved instance; AWS calls it simply Reserved Instances. Azure has Reserved Virtual Machine (VM) Instances and Google Cloud has reserve VM capacity. It’s all the same basic idea—if you make a long-term commitment (one to three years) to use a specific instance type from a hyperscaler, you’ll secure the resources and get a discounted hourly rate. The longer the commitment and bigger the upfront payment you’re willing to make, the more money you’ll save. Is it a good strategy? Sure, for a number of reasons, including:

  1. Cost savings: It’s cheaper than on-demand instances.
  2. Capacity planning: You’ll have the necessary resources when you need them, guaranteed.
  3. Budget planning: You’ll have a better idea of your costs in advance.

The downside is that you’re locked into the spend. If your needs change, you may end up paying for capacity you don’t need. 

A lot of people think using reserved instances is the safest approach to saving on compute. But is it really the BEST way to save? Not necessarily.

But save even MORE money with spot instances

AWS’s Spot Instances, Google Cloud’s Spot VMs or Preemptible VMs, and Azure’s Spot VMs offer unused compute capacity at significantly lower prices—like 90% lower. Why the steep discount? You can lose them to a higher bidder at any moment, with only a couple minutes’ notice. If you’re willing and able to move your workloads to accommodate this dynamism, you can save big bucks.

Here are some examples that are a good fit for spot instances:

  1. You can tolerate interruptions and have fault-tolerant, stateless applications;
  2. Your workload has flexible start and end times, or can be scheduled during off-peak hours;
  3. You integrate spot instances with auto-scaling and load balancing for elasticity;
  4. You have redundancy and fault-tolerant architectures in place; and
  5. You can actively monitor and manage spot instance interruptions.

Spot instances come with risks due to price volatility and potential terminations. They may not be suitable for workloads with strict uptime requirements or real-time processing needs. You should consider these trade-offs and assess the suitability of spot instances based on your workload’s characteristics and requirements.

Most telco execs feel spot instances can’t be used for their workloads. But I know of a UK MVNO that kept an open mind on spot instances, figured out how to make work, and is now reveling in the savings.

giffgaff saves big

Steve McDonald is COO and CTO at giffgaff, a UK mobile virtual network operator (MVNO). In Episode 68 of my Telco in 20 podcast, Steve shared how giffgaff uses AWS Spot Instances to save big money. He points out (at timestamp 12:25) that once giffgaff moved its operation to AWS, it moved its entire production estate to EC2 Spot Instances. To make it work for giffgaff he says, “You have to be in a position where you can build a new server and insert it into the cluster. We do that now about 60-80 times a day.”

How cool is that? 

giffgaff tossed conventional wisdom out the window, and refactored its entire production estate so it can run on AWS Spot Instances. You see, most people tend to overestimate the likelihood of preemption of a spot instance and underestimate their tolerance for it. In reality, instance preemption is not as common as you may think in most instance families. By being flexible with different instance types, significant cost savings can be achieved without significant negative impact. With a well-designed spot usage strategy, giffgaff has likely been able to achieve the 50-90% savings Amazon claims. Essentially, by utilizing AWS Spot Instances, giffgaff can take advantage of a market inefficiency with respect to compute pricing while others overspend on elastic compute (EC2) spend. That’s a smart move. Want to learn way more than you ever thought possible about AWS pricing inefficiencies with regard to spot instance? Read this great blog from Eric Pauley for more information on this cost optimization strategy.

The telcos and vendors that win big with the public cloud will be those that get really skilled at optimizing the cost of their workloads. Watch out if your teammate or vendor is recommending an exclusively reserved instance to run their workload. Discuss spot instances with them to see if you can squeeze out more savings and make your money go further. By saving on compute, you can stretch your budget and fund all those cool public cloud projects you need to work on, like genAI and, of course, Totogi!

Want more clever cost-saving tips like this? We do, too! In fact, TelcoDR is exploring an exclusive deal with software startup CloudFix to offer a cloud optimization tool for telcos that will save big money, too. We’ll be announcing it in a few weeks. Stay tuned, and keep on optimizing your cost in the cloud!